## July 11, 2016 MPSoC2016@Nara Mini Keynote



# **Customizable Hardware Abstraction**

### Shinya Takamaeda-Yamazaki

Nara Institute of Science and Technology (NAIST)

How do you design a custom hardware?



#### by HDL (Hardware Description Language)

- Such as Verilog HDL and VHDL
- Fully customizable and high performance<sup>®</sup>
- Huge development efforts due to the few abstractions⊗

#### by HLS (High Level Synthesis)

- Such as C/C++ (Vivado HLS), Java (Max), OpenCL (Altera), ...
- High producitivity by untimed design manner<sup>3</sup>
- Hard to customize how the compiler generates codes ☺
  - Sometimes cannot reach to the maximum performance

Motivation: How to keep both productivity and customizability



Tradeoffs between HDL and HLS

- High productivity by high-level abstraction
- High hardware quality by low-level customization

#### How to keep both:

Allow users to build up a custom abstraction

 Seamless DSL from RTL to HLS: Custom abstraction/method by using low-level abstractions

### **Customizable Hardware Abstraction**



## First level abstraction of HDL component





# Code generation by run



Verilog code is obtained by calling to\_verilog() method









## Operator overload for dataflow







## **Difference to HDL and HLS**



#### In DSL/HLS, source code structure is a circuit definition

- <u>Reflection</u>: getting the source code structure
  - ✓ A compiler analyzes the source code and convert into dataflow, etc.
- When a new part is added, the source code must be changed
  - ✓ Of course, a frequently-appeared pattern also must be described in the source code again☺
- Subset of the original language syntax can be utilized⊗

Veriloggen explicitly constructs a hardware source code

- No reflection: a target source code is constructed by run
  - Frequently-appeared coding patterns can be summarized by method extraction and new user-defined class definition
  - ✓ All python features can be utilized for the code construction☺

# Evaluation: Productivity of Veriloggen



#### Python Verilog Python LOC | Verilog LOC Fmax [MHz] #Reg #LUT #DSP Description App LED 353118 33 352Blinking LED (#LEDs=8) 0 45713 Sort-4 1295587 366 Sorting network (#data=4, 32-bit int) 0 (= Sort-4) 3575Sorting network (#data=8, 32-bit int) Sort-8 6709 2712 0 370 Sort-16 (= Sort-4) 15779 29873 11576 0 370 Sorting network (#data=16, 32-bit int) Sort-32 (= Sort-4)66107 47736 Sorting network (#data=32, 32-bit int) 1255450 370 MM-p (all) **V**2 1891 2219 3 190Matrix mult pipeline in Python (32-bit int) MM-p (FSM) 33 N/A N/A N/A N/A Matrix mult pipeline in Python (32-bit int) MM-v (all) N/A53220412511146Matrix mult pipeline in Verilog (32-bit int) 3 MM-v (FSM) N/A 120N/A N/A N/A N/A Matrix mult pipeline in Verilog (32-bit int)

- Various hardware structures can be synthesized from a single Python source code
  - ex) obtained 4 sort circuits from Python code of 45 lines
  - Delay registers and stall circuits are automatically inserted
  - $\rightarrow$  High productivity of custom computing pipeline development

## Conclusion



#### Customizable hardware abstraction is proposed

• Veriloggen: Explicit hardware modeling by Python

