Multi-core platforms are a reality... but where is the software support?

Rudy Lauwereins Vice-President IMEC Prof. Katholieke Universiteit Leuven Director Institute for BroadBand Technologies IBBT



### Content

- Introduction: situating the application domain
- Architectural evolution
- Design flow
- Case study 1: software defined radio
- Case study 2: multi-format multimedia codec
- Conclusion



## Vision: Multi-core platforms are essential for cheap consumer products and battery operated devices



imec

## Multi-core platforms: the devil is in the software heaven... and in the hell of physics



CHALLENGE: Create methods, tools, platform for mapping <u>multiple</u> SW applications on *one* flexible yet energy efficient and scaling tolerant platform

With daunting properties:

power efficient (>100 GOPS/W)

- cheap (10€)
- flexible (SW centric platform)
- memory intensive (>70% area, power)

#### Rudy Lauwereins© imec/restricted 2006

4

## Platforms evolve towards better supporting higher degrees of parallelism

#### Today's multi-core platforms

Non-scalable, synchronous bus based interconnect architecture

Limited number of cores (2-8)

Single core design flow supports mapping of single application on single processor

One application runs on single core

Core processors have limited functional (VLIW-8) and data (SIMD-4) parallelism



imec

#### Next generation multi-core platforms

Scalable GALS-NoC based interconnect architecture

Large number of cores (16-100)

Multi-core design flow supports mapping of multiple applications on heterogeneous multi-core platform

One application runs on N cores, depending on run-time load condition

Core processors have high functional (2D VLIW-64) and data (SIMD-1024) parallelism



Trend towards processing components with increased support for functional parallelism

Supported by retargetable tool

suite: compiler, HW generator,

## Tightly coupled 1 dimensional and 2 dimensional VLIW processor template

cycle accurate simulator, instruction accurate simulator Program Fetch Instruction Dispatch Instruction Decode con Register File 1D VLIW executes control flow code, exploits ILP vhd 2D VLIW runs data-flow DRESC accelerated code arch (kernels), Tool .str exploits LLP Suite sim Speed-up and power-gain





#### Trend towards processing components with increased support for data parallelism



Similar idea: "A 40 GOPS 250 mW massively parallel processor based on matrix architecture", M. Nakajima et.al., Renesas, ISSCC2006

2048-way SIMD with 2-bit processors



#### MPSoC Platform Mapping Problem





#### Application mapping on MPSoC platforms: Predictable, Modular and Scalable



**Design-Time Mapping Predictability**: how to ensure that multiple independently mapped applications will simultaneously work with the same behavior on a single platform

**Design-Time Mapping Scalability**: offering different solutions for different platform resources and run-time requirements (e.g. resolution)

**Run-Time Decision Making:** Exploiting design-time solutions for handling run-time conditions?



#### MPSoC: Design-Time, Run-Time Manager & Platform



# Major task of the design flow: reduce memory access conflicts



### Different programming model levels



### Design-Time/Run-Time MPSoC picture



## Application Exploration Information is stored in a Pareto Surface





# Choosing a pareto point and assigning resources



Case studies showing the use of the mapping tools and the 2D VLIW processor for software defined radio and multi-format multimedia codecs



Digital and RF implementation platforms

Instantiate in SDR platform



platform

Multi-mode wireless communication platform Heterogeneous platform, 4x4 SIMD-4 ADRES





Multi-format multimedia platform 6 ADRES cores, NoC



## Wireless standards: a large variety, we show coverage through limit cases



imec

#### Algorithm (code)-architecture co-design to bridge huge gaps



# Single flexible platform is optimized for many multimedia coding standards



#### Flexible MM Platform: Design-Time Flexibility Area determined by highest performance standard needed



## Resulting Run-Time Flexibility Power determined by actual usage scenario

Power estimates (mW)



### Conclusion

- Platforms evolve towards supporting higher degrees of parallelism
- Beware of the hell of physics
- Be even more aware of the software devil: a multi-core design flow is in urgent need
- Research in all above is becoming mature and proven in multi-mode radios and multi-format multimedia codecs



## aspire invent achieve

imec