#### Automatic Instruction-Set Specialisation

#### Paolo lenne

(based on work with L. Pozzi, K. Atasu, P. Biswas, N. Dutt, and J. Großschädl)



Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA)



Ecole Polytechnique Fédérale de Lausanne (EPFL)

#### **Customizable Processors**





 Available in many commercial processors (from MIPS, STM, IFX, Tensilica, ARC, Xilinx, Altera,...)

## Instruction Set Extensions (ISEs)



- Collapse a subset of the Direct Acyclic Graph nodes into a single
  Application-Specific Functional Unit (AFU)
  - Exploit cheaply the parallelism within the basic block
  - Simplify operations with constant operands
  - Optimise sequences of instructions (logic, arithmetic, etc.)
  - Exploit limited precision

#### The Basic Problem



- Goal: Find subgraphs
  - having a user defined maximum number of inputs and outputs,
  - including disconnected components, and
  - that maximize the overall speedup

#### Pruning the Search Space



## **Algorithm Performance**



Number of subgraphs considered using an output port constraint of two

## Many Related Problems



# Bringing Data Closer to the Consumer



## Adding Local Memory to ISEs

- Include selected LD/ST operations in subgraphs for instruction set-extensions
- Decide which pieces of data are best stored locally in the AFU
- Preload/unload the AFU memories with DMA transfers placed in the most economical spots in the Control Flow Graph



### Advantage of Local Storage



#### **Many Related Problems**



## Not Enough RF Ports?



## **Pipelining and Sequentialization**

Algorithm to solve the following problem

- **Pipelining**: no path inside the application-specific functional unit longer than cycle time
- Legality: same number of registers on every path
- I/O Schedulability: no more than N<sub>in</sub> input values nor more than N<sub>out</sub> output values required per cycle





#### How Competitive Can We Be to Manual ISE Design?



#### Often Applications Offer Options in Algorithm Implementation

#### • Elliptic curve cryptography

- Elliptic curves over a finite field K
- Main operation: point multiplication *k*·*P*
- Realized by arithmetic operations in the field *K*
- Field order: 160-256 bits

#### **Problem:**

How to help the *algorithm* designer select the best options so that the *implementation* will be most efficient?



- Many implementation options
  - Binary field GF(2m)
  - Prime field GF(p)
  - Different multiplication algorithms

## Use Automatic ISE to Get Early Feedback

 Großschädl et al. (DATE 2006) have shown a very close qualitative matching between accurate microarchitectural ISE studies and automatic ISE



#### Complex Integrated Systems Still Need Research in Design Automation



- Customization marks a turn in the history of processors
- Design tools for high-quality datapaths must be to the reach of all designers

## You Can Now Buy It ...



## ...and You Can Read About It!

- 1. From Prêt-à-Porter to Tailor-Made *Ienne and Leupers*
- 2. Opportunities for Application-Specific Processors: The Case of Wireless Communications *Ascheid and Meyr*
- 3. Customizing Processors: Lofty Ambitions, Stark Realities *Fisher, Faraboschi, and Young*
- 4. Architecture Description Languages *Mishra and Dutt*
- 5. C Compiler Retargeting Leupers
- 6. Automated Processor Configuration and Instruction Extension *Goodwin, Leibson, and Martin*
- 7. Automatic Instruction-Set Extensions *Pozzi and Ienne*
- 8. Challenges to Automatic Customization Topham
- 9. Coprocessor Generation from Executable Code *Taylor* and *Stewart*
- 10. Datapath Synthesis Brisk and Sarrafzadeh
- 11. Instruction Matching and Modeling *Parameswaran, Henkel, and Cheung*
- 12. Processor Verification Große, Siegmund, and Drechsler
- 13. Sub-RISC Processors Mihal, Weber, and Keutzer
- 14. Application Specific Instruction Set Processor for UMTS-FDD Cell Search – *Puusaari, Yli-Pietilä, and Rounioja*
- 15. Hardware/Software Tradeoffs for Advanced 3G Channel Decoding *Schmidt and Wehn*
- 16. Application Code Profiling and ISA Synthesis on MIPS32 *Leupers*
- 17. Designing Soft Processors for FPGAs *Bilski, Mohan, and Wittig*

