## Designing Programmable Platforms: From ASIC to ASIP

### **MPSoC 2005**

#### **Heinrich Meyr**

CoWare Inc., San Jose and Integrated Signal Processing Systems (ISS), Aachen University of Technology, Germany





#### Agenda

- Facts & Conclusions
- Heterogeneous MPSoC
  - » Energy Efficiency vs.Flexibility
  - **»** How to explore the Design Space?

Agenda

- ASIP Design
- Economics of SoC Development
- Conclusions



# **Facts & Conclusion**



#### **Core Proposition**



Effort (EDA tools investment)





# "Form follows Function"

Mies van der Rohe



#### Agenda

- Facts & Conclusions
- Heterogeneous MPSoC
  - » Energy Efficiency vs.Flexibility
  - » How to explore the Design Space?

#### Agenda

- ASIP Design
- Economics of SoC Development
- Conclusions



Heterogeneous MPSoC

# Trade-off between Flexibility and Energy -Efficiency



#### **Architectural Objectives**

Need more MOPS/Watt and MOPS/mm<sup>2</sup> to minimize the global performance measure for battery driven devices

#### Energy / decoded Bit = (Joule/Bit)



#### **Computational Effiency vs. Flexibility**





Source: T.Noll, RWTH Aachen

# How to Explore the Design Space and design MPSoC's?



#### **Diversity of Network Procesors**





#### The five elements of the MESCAL Methodology

- 1. Judiously apply benchmarking
- 2. Inclusively identify the architectural space
- 3. Efficiently describe and evaluate the ASIPs
- 4. Comprehensibly explore the design space
- 5. Sucessfully deploy the ASIP



- The signal/information processing task can be naturally partitioned MESCAL 1:
  - » Decoders
  - » Filters

MESCAL 1: Judiously apply benchmarking

- » Channel estimator ri Knowledge of the Task
- The building blocks are <u>loosely coupled</u>
- The signal processing task is (mostly) cyclostationary



#### From Function to Algorithm Classes

- Butterfly unit
  - » Viterbi & MAP decoder
  - » MLSE equalizer
- Eigenvalue decomposition (EVD)
  - » Delay acquisition (CDMA)
  - » MIMO Tx processing
- Matrix-Matrix & Matrix-Vector Multiplication
  - » MIMO processing (Rx & Tx) cks: Algorithm Types
  - » LMMSE channel estimation (OFDM & MIMO)
- CORDIC
  - » Frequency offset estimation (e.g. AFC)
  - » **OFDM** post-**FFT** synchronization (sampling clock, fine frequency)
- FFT & IFFT (spectral processing)
  - » OFDM
  - » Speech post processing (noise suppression)
  - » Image processing (not FFT but DCT)



#### **Algorithmic Descriptors**

- **Clock rate of processing elements (1/Tc)**
- Sampling rate of the signal (1/Ts)
- **Algorithm characteristic** 
  - **Complexity (MOPS/sample) >>**
  - **Computational characteristic >>** 
    - » Data flow

      - Data locality - Data storage
      - Parallelism
    - » Control flow
- **Connectivity of algorithms** 
  - » Spatial
  - Temporal **>>**



#### 384 kbps UMTS Receiver BB Complexity



#### 200 Mbps UWB Receiver BB Complexity I



Computational complexity of previous and recent video coding standards



#### **System Functional Requirements for Handheld**

# Massive Parallelism required in the foreseeable future

|                         | 2003 | 2009 | 2013 |
|-------------------------|------|------|------|
| Frequency<br>(MHz)      | 300  | 600  | 1500 |
| Giga<br>Operations      | 0,3  | 14   | 2458 |
| Operations<br>per Cycle | 1    | 23   | 1638 |

Source: International Technology Roadmap for Semiconductors (ITRS, TX 2003)



#### **Lessons Learned**

- Virtual Prototype (Product) of utmost importance
  - » Early customer interaction
  - » Debugging
  - » Verification&Validation
- Product Differentiator
  - » 80% of Area and Power Consumption in the inner receiver (Algorithm and Architecture Design)
  - » 10-15% of Area and Power Consumption in Decoder (Architecture Design)
  - » 5% of Area and Power Consumption in the ARM (But major portion of cost is SW/Protocol implementation)



#### **Canonical Receiver Model**



#### **Design Task: Spatial and Temporal Mapping**



#### **Design Task: Spatial and Temporal Mapping**



# **Enabling MP-SoC Design**



#### System Level Tools I: Application & IP Creation



#### System Level Tools I: Application & IP Creation





#### Agenda

- Facts & Conclusions
- Heterogeneous MPSoC
  - » Energy Efficiency vs.Flexibility
  - **»** How to explore the Design Space?

Agenda

- ASIP Design
- Economics of SoC Development
- Conclusions



#### **Processor Design Space**





#### Architecture Description Language based Processor Design

- The purpose of an architecture description language (e.g LISA) is:
  - » To allow for an iterative design to efficiently explore architecture alternatives

hip

- » To jointly design " MESCAL 3: communication Efficiently describe
- » To automatically the ASIP implementation)
- » To automatically generate tools
  - » Assembler ,Linker, Compiler, Simulator, co-simulation interfaces
- From a single model at various level of temporal and spatial abstraction



#### **LISA 2.0 - Abstraction Levels**







#### **CORXpert<sup>™</sup> - Automating MIPS CorExtend<sup>™</sup>**











#### **ASDSP FPGA Implementation**

#### Myjung Sunwoo, Ajiou University,

#### ASDSP Core Design



Support the Special Instruction Set for FFT Operation and the BMU Instruction Improve the Performance for OFDM Communication





#### A low-power ASIP for Infineon DVB-T 2<sup>nd</sup> generation Single-Chip Receiver:

- ASIP for DVB-T acquisition and tracking algorithms (sampling-clock-synchronization, interpolation / decimation, carrier frequency offset estimation)
- Harvard Architecture
- 60 mostly RISC-like Instructions & Special Instructions for CORDIC-Algorithm
- 8x32-Bit General Purpose Registers, 4x9-Bit Address Registers
- 2048x20-Bit Instruction ROM, 512x32-Bit Data Memory
- I2C Registers and dedicated interfaces for external communication





The Motorola M68HC11 Architecture

### The Motorola M68HC11 Increa Architecture - but How?



#### **Architecture Overview**

#### M68HC11 CPU Architecture : Hot spots

- » 8-bit micro-controller.
  - » Harvard Architecture
- » 7 CPU Registers.
- » 6 different Addressing Modes.
- » Shared data and program bus. :

stalled data access

- » Instruction width : 8,16, 24, 32, 40 : multi-cycle fetch
- » 8-bit opcode : 181 instructions
- » Clock speed : ~200 MHz
- » Performance : : non-pipelined
- » Area : 15K to 30K (DesignWare<sup>®</sup> Library)



#### **Architecture Development with LISA**





#### **Results**

•Area < 23k gates

•Clock speed ~ 200 MHz



•Execution time speed up 62 % for spanning tree application

•Mapped onto Xilinx FPGA



### Architecture Development with LISA

| <ul> <li>Studying the architecture</li> </ul>               | 4 days  |
|-------------------------------------------------------------|---------|
| <ul> <li>Basic architecture modifications</li> </ul>        | 2 days  |
| <ul> <li>Grouping and coding of the instructions</li> </ul> | 5 1 day |
| •Writing the LISA model                                     |         |
| -basic syntax and coding                                    | 4 days  |
| -behavior section                                           | 6 days  |
| •Validation                                                 | 4 days  |
| •HDL Generation                                             | 2 days  |
| Total                                                       | 23 days |



#### Agenda

- Facts & Conclusions
- Heterogeneous MPSoC
  - » Energy Efficiency vs.Flexibility
  - **»** How to explore the Design Space?

Agenda

- ASIP Design
- Economics of SoC Development
- Conclusions





## **Opportunity: For Whom ?**



#### **Food Chain in the Wireless Market**





#### **MPSoC Characteristics**

- Growth potential :
  - » Functionality increases qualitatively with time
    - → Newcomer chases a moving target
- System property:
  - » The whole is more than the sum of the parts
    - Newcomer needs to build up expertise in various fields and ....needs to learn how to manage the interaction between them





#### Agenda

- Facts & Conclusions
- Heterogeneous MPSoC
  - » Energy Efficiency vs.Flexibility
  - **»** How to explore the Design Space?

Agenda

- ASIP Design
- Economics of SoC Development
- Conclusions



### Summary

 We need to understand the process of engineering a complex SoC as an "Apollo" project



#### ntent- but How?



# **Thank You**

