LaME-in-Chief: David F. Bacon

**IBM Research** 

## THE HETEROGENEOUS ERA



**GPU** 



**Cell BE** 



Tilera 64



**FPGA** 



**IBM PowerEN** 

# WHAT IS PERFORMANCE?

|      | Chip               | flops<br>cycle | freq<br>(GHz) | Gflops<br>(peak) | Gflops<br>watt | Mflops<br>\$ | Mflops<br>watt/\$ |
|------|--------------------|----------------|---------------|------------------|----------------|--------------|-------------------|
| CPU_ | Intel<br>Core i7   | 32             | 3.2           | 102              | 0.8            | 70           | 0.7               |
| CPU  | AMD<br>9270        | 1600           | 0.75          | 1200             | 5.5            | 800          | 3.6               |
| FPGA | Xilinx<br>V5 LX330 | 1040           | 0.55          | 550              | 13.7           | 138          | 8.1               |

#### **Peak Performance**

Source: Brodtkorb et al, The State-of-the-Art in Heterogeneous Computing, 2010

# Performance, Take 2

|      | Chip               | Gsamples sec | Msamples joule |
|------|--------------------|--------------|----------------|
| CPU  | Intel<br>Core 2    | 1.4          | 5              |
| GPU  | nVidia<br>GTX280   | 14           | 115            |
| FPGA | Xilinx<br>Virtex 5 | 44           | 1461           |

# **Actual Performance** (Random Number Generation)

Source: Thomas et al, A comparison of CPUs, GPUs, FPGAs and masssively parallel processor arrays for random number generation, 2009

#### **OPPORTUNITIES IN HETEROGENEOUS SYSTEMS**

- Performance
  - 10-1000x speedups

- Efficiency
  - 10-100x improvement in ops/watt

### HETEROGENEOUS PROGRAMMING TODAY



1. Heterogeneous languages

#### THE LIQUID METAL PROGRAMMING LANGUAGE



#### THE ARTIFACT STORE & EXCLUSION











- 1. Heterogeneous languages
- 2. Some programs can't or shouldn't be expressed

#### **EXECUTION, COMMUNICATION, AND REPLACEMENT**



- 1. Heterogeneous languages
- 2. Some programs can't or shouldn't be expressed
- 3. Data transfer is under flux and highly variable
  - a. Ongoing debate and experimentation with coherent attach

# LIES, DAMNED LIES, AND FPGAS

- "It works"
  - A logic analyzer shows it meets the interface spec
- "They can be reprogrammed in a few milliseconds"
  - A few milliseconds + a few minutes to reboot
- "They have rich, high-speed I/O connections"
  - Which are very hard to talk to
- "...so they can interface to anything"
  - Which means they work with almost nothing
- "Synthesis times can be reduced with partial reconfig"
  - Partial reconfig is always available in Now + 7 months

- 1. Heterogeneous languages
- 2. Some programs can't or shouldn't be expressed
- 3. Data transfer is under flux and highly variable
  - a. Ongoing debate and experimentation with coherent attach
- 4. FPGAs have high bring-up costs (40-60% of a project)

#### PERFORMANCE AND USER EXPERIENCE

|      | Chip               | Gsamples | Msamples joule |  |
|------|--------------------|----------|----------------|--|
| CPU  | Intel<br>Core 2    | 1.4      | 5              |  |
| GPU  | nVidia<br>GTX280   | 14       | 115            |  |
| FPGA | Xilinx<br>Virtex 5 | 44       | 1461           |  |

2 ½ minutes

16 seconds

5 seconds

- 1. Heterogeneous languages
- 2. Some programs can't or shouldn't be expressed
- 3. Data transfer is under flux and highly variable
  - a. Ongoing debate and experimentation with coherent attach
- 4. FPGAs have high bring-up costs (40-60% of a project)
- 5. Even if we hide complexity, we can't hide performance

# SHARING DEVICES?







- 1. Heterogeneous languages
- 2. Some programs can't or shouldn't be expressed
- 3. Data transfer is under flux and highly variable
  - a. Ongoing debate and experimentation with coherent attach
- 4. FPGAs have high bring-up costs (40-60% of a project)
- 5. Even if we hide complexity, we can't hide performance
- 6. No virtualization (is this a bad thing?)

# WHAT'S OUR METRIC?



- 1. Heterogeneous languages
- 2. Some programs can't or shouldn't be expressed
- 3. Data transfer is under flux and highly variable
  - a. Ongoing debate and experimentation with coherent attach
- 4. FPGAs have high bring-up costs (40-60% of a project)
- 5. Even if we hide complexity, we can't hide performance
- 6. No virtualization (is this a bad thing?)
- 7. No methodology for evaluation

#### **BUT IT SURE IS NICE WHEN IT WORKS**



