



Open Source in Hardware Design – Opportunities, Challenges and Example

Norbert Wehn

23ND EDITION OF THE MPSOC EVENT

JUNE 15TH-20TH, 2025 LES FERMES DE MARIE, MEGEVE, FRANCE

and from the operation of the second second

R

TU Rheinland-Pfälzische Technische Universität Kaiserslautern Landau

## Open-Source in Hardware !?

**Basic Principles if Open-Source** 

- Openness, Transparency, collaborative design
- Reduced cost, faster adaption, fosters innovation

**Open-Source Software** 

- Started about 30 years ago, successful and well established
- Large community, collaborative development, e.g. Linux
- Enterprises: SW-development cost 3,5 x larger than without OSS (Harvard Business School 2024)
- Concepts/experience transferable to chip design?

Open-Source Hardware – in the beginning

- RISC-V
- Technological Sovereignty
- Access: NDA, export regulations



IC Research "Ecosystem"



## **Open-Source Activities**

Many Activitities

- BMBF projects DE:Sign, DE:Sign Challenge, acatech study, IHP....
- PULP platforms, OpenROAD initiative, Tiny tapeout/efabless
- EU Initiative: free and Open-Source Silicon Foundation (FOSSI): EU Roadmap

#### **Different Views/Opinions**

- EDA, beginners in design, skilled designers
- Research/Universities, StartUps/SMEs, large industry
- Technologynode: 130nm...7nm
- Type of circuit design: digital, Mixed-Signal, RF, sensor, optic, powerss., ...

#### Challenges

- Maturity and PPA Quality
- Support, Business model
- Dependency on USA



ETH zürich 🍥 www.www.utumme







en source is as much wore than there of cost it gives everybody, including you, the It is a hape the future of a project. Unlike the full benefics of query source today become a contributed Participant in discussions, report fugs, date you investments, or even start a new free and goes source billion project!





## **Open-Source Overview**

| Design Activity                          | Synopsys                                   | Cadence                             | Siemens EDA             | Open-Source                    |
|------------------------------------------|--------------------------------------------|-------------------------------------|-------------------------|--------------------------------|
| High-Level Synthesis                     | Symphony C                                 | Stratus HLS                         | Catapult                | XLS (Google)                   |
| RTL-Analysis                             | RTL-Architect, SpyGlass, Verdi             | Joules, Litmus                      | Questa-Visualizer       | Cocotb                         |
| RTL-Simulator                            | VCS, VC family, Zebu                       | Xcelium, Verisium, Palladium        | ModelSim, Veloce        | Verilator/GHDL/Icarus          |
| Logic Synthesis                          | Design Compiler,<br>FusionCompiler         | Genus                               | Oasys                   | Yosys                          |
| Formal Equivalent                        | Formality, VCFormal                        | Japser, Conformal                   | OneSpin                 | Yosys                          |
| Static Timing Analysis                   | PrimeTime                                  | Tempus                              | -                       | OpenSTA                        |
| Place and Route                          | ICC/ICC2, FusionCompiler                   | Encounter, Innovus                  | Aprisa                  | OpenROAD                       |
| Power Analysis, EM/IR                    | PrimePower, Redhawk                        | Voltus                              | PowerPro, mPower        | OpenROAD (OpenSTA)             |
| Physical Verification<br>(DRC/LVS)       | IC Validator                               | PVS, Pegasus                        | Calibre<br>•            | MAGIC/Klayout                  |
| Parasitic Extraction                     | StarRC                                     | Quantus                             | Calibre xRC/xACT        | MAGIC/OpenRCX                  |
| ECO-Tools                                | Tweaker, PrimeTimeECO                      | Talus ECO                           | -                       | N/A                            |
| Analog, AMS, RF                          | CustomCompiler, HSPICE                     | Virtuoso, Spectre                   | Questa, Eldo, FastSPICE | Xsheme, ngspice, Xyce, OpenVAF |
| Library Characterization<br>& Prediction | SiliconSmart, LibraryCompiler,<br>NanoTime | Liberate                            | Solido, Kronos          | N/A                            |
| Package, Co-Design                       | 3DIC Compiler                              | Silicon-Package-Board Co-<br>Design | Xpedition               | N/A                            |

Yosys, Klayout, und ngspice are from Europe!

Source: updated table from David Thanh



## Open Source – OpenRoad



ORFlow

## Open-Source Digital Benchmark - Example



#### Open-Source PicoRV32 RISC-V Core

#### STD-Lib from ARM, 9-Track SLVT, **12nm** GF12LPP für kommerziellen Flow und OpenROAD Flow

|                         | Commercial    | OpenRoad      |
|-------------------------|---------------|---------------|
| Performance             |               |               |
| Clock Constrain<br>[ps] | 2000          | 2000          |
| WC Delay [ps]           | 1935          | 631 (-67%)    |
| Power                   |               |               |
| Switching [ $\mu W$ ]   | 2122          | 3830 (80.5%)  |
| Internal [ $\mu W$ ]    | 3459          | 4760 (37.6%)  |
| Leakage [µW]            | 123           | -66.8 (-154%) |
| Total [ $\mu W$ ]       | 5704          | 8523 (49.4%)  |
| Area                    |               |               |
| Cell [ $\mu m^2$ ]      | 2205          | 5688 (61.2%)  |
| Total [ $\mu m^2$ ]     | 3593          | 11664 (69.2%) |
| Run-Time                |               |               |
| CPU Cores Used          | 8 (Licences!) | 96            |
| Time [min]              | 130           | 5 (-96%)      |

|                       | Commercial    | OpenRoad      |
|-----------------------|---------------|---------------|
| Performance           |               |               |
| Clock Constrain [ps]  | 650           | 650           |
| WC Delay [ps]         | 650           | 646 (-0.62%)  |
| Power                 |               |               |
| Switching [ $\mu W$ ] | 2769          | 14200 (330%)  |
| Internal [ $\mu W$ ]  | 3302          | 9780 (196%)   |
| Leakage [ $\mu W$ ]   | 141           | -66.8 (-147%) |
| Total [ $\mu W$ ]     | 6212          | 23913 (285%)  |
| Area                  |               |               |
| Cell [ $\mu m^2$ ]    | 2328          | 5471 (57.4%)  |
| Total [ $\mu m^2$ ]   | 4113          | 11664 (64.7%) |
| Run-Time              |               |               |
| CPU Cores Used        | 8 (Licences!) | 96            |
| Time [min]            | 130           | 5 (-96%)      |



- Results independent of timing constraints: no trade-off area, power and timing for relaxed timing constraints
- Leakage power estimations are wrong
- But fast run times!



**Open-Source Tools** 





## Importance of DRAM Memories – AI Challenges

#### Model Size versus AI accelerator Memory Capacity

100

Operational Intensity: Ops/weight byte (log scale

1000

#### **Memory Bandwidth**





## Al Memory Challenges – Energy



Channel

Source: A 16Gb 9.5Gb/s/pin LPDDR5X SDRAM with Low-Power Schemes Exploiting Dynamic Voltage-Frequency Scaling and Offset-Calibrated Readout Sense Amplifiers in a Fourth Generation 10nm DRAM Process, https://doi.org/10.1109/ISSCC42614.2022.9731537 Source: https://www.heise.de/news/Apple-M4\_TSMCs-FinFlex-hilft-den-Performance-Rechenkernen-9766547.html









Source: Tesla /FSD 3







## **DRAM** Basics

DRAM Commands

- ACT: Activates a specific row in a specific bank (sensing into PSA)
- RD: Read from activated row (prefetch from PSA to SSA and burst out)
- PRE: Precharges set LWL=0 set LBL=VDD/2
- REFA: DRAM cells are leaky and have to be refreshed



## **DRAM Behavior**

#### Latency and energy breakdown e.g. DDR4, 4Gbit, 1200MHz, 64 bits burst read/write

Energy in Twitter memchached Application 2GB DDR3

Read



DRAM latency/energy) largely varies: access pattern, address mapping, scheduling....



Up to 10x variance in access times  $\rightarrow$  row change in the same bank worst case: 42 clock cycles

Modelling of accurate DRAM timing and DRAM power are essential to assess the overall SoC performance and power/energy consumption



## Open Source – DRAMSys/DRAMPower



DRAM timing modelling tools

- Big EDA vendors: no commercial interest/small market
- CPU/SoC vendors: internal modelling tools, not public
- Memory vendors: data/excel sheets, don't disclose internal details
- Academic tools
  - DRAMsim (University of Maryland), 2005
  - Gem5, 2011
  - DRAMSys + DRAMPower (RPTU, Fraunhofer IESE, JMU Würzburg), 2013
  - Ramulator (CMU/ETH) + VAMPIRE (CMU/ETH), 2015



## Simulation Speed vs. Simulation Accuracy



How can you accelerate simulation without losing accuracy?

Replace individual signals with function calls (concept of TLM)



Simulate only the clock cycles (events) where "something" happens





## Simulation Speed vs. Simulation Accuracy



- gem5's DRAM model: simplified model, does not fulfill JEDEC cycle accuracy
- DRAMsim, Ramulator, DRAMSys: clock cycle accuracy according to JEDEC protocol
- DRAMsim, Ramulator: based on clock cycle events





## History of DRAMSys



- Large DRAM knowledge within the group (DRAM industry background)
- First publication in 2013: "TLM modelling of 3D stacked wide I/O DRAM subsystems: a virtual platform for memory controller design space exploration", M Jung, C Weis, N Wehn, K Chandraseka
- Extended to DDR3, DDR4, LPDDR3 and HMC
- PhD students, master students, HiWi's -> Too many cooks spoil the broth...
- Completely rewritten
  - -Software architecture: enables easy integration of new standards and features
  - Support latest standards e.g. LPDDR4, GDDR6, HBM2
- Open-sourced in 2020 (BSD3): <u>https://github.com/tukl-msd/DRAMSys</u>



## DRAMSys Freemium Business Model



- Established DRAM standards (DDR3/4, LPDDR4/4x, GDDR5/5x/6, HBM2) are open-sourced on GitHub
- Emerging DRAM standards (DDR5, LPDDR5, HBM3...) can be licensed via Fraunhofer IESE
  - Free academic licenses for research
  - Paid industrial licenses including support, consulting and custom modifications





## **DRAMSys Facts and Figures**

- DRAMSys
  - -4 key papers ~ 170 citations
  - -contributed to more than 20 further publications of the chair(s)
- Funding from DFG/BMBF/chair...
- More than 20 contributors and 2700 commits in master branch
- 240 stars and 60 forks on GitHub
- Large user community





## History of DRAMPower



- Power model originally developed by TU Delft/TU Eindhoven in 2011
  - Goal: improve the accuracy of Micron's DRAM power calculators (Excel spreadsheets)
- RPTU-EMS
  - Validated and improved the model with circuit-level simulations
  - Took over management and further development
- First publication 2012: "DRAMPower Open-source DRAM power & energy estimation tool" K. Chandrasekar, C. Weis, Y. Li, B. Akesson, N. Wehn, K. Goossens
- First Open-Sourced in 2013
- Introduction of new features, e.g., a bank-sensitive power model





## DRAMPower 5 - 2024

- Major updates in context of BMBF project DI-DERAMSys
- Improved software architecture
- Support for current DRAM standards (DDR5, LPDDR5, ...)
- Accurate interface power modelling



R













## **DRAMPower Facts and Figures**



- State-of-the-art tool for DRAM power modelling
- Part of gem5
- 12 contributors and 400 commits on master branch
- DRAMpower papers ~ 300 citations
- 150 stars and 50 forks on GitHub
- DRAM Power 5.0 released 2024: <u>https://github.com/tukl-msd/DRAMPower</u>



## Conclusion

### **Open-Source in Hardware Design: Hype or important new direction?**

- Can ease technology access and contribute to technological sovereignty (for older technology nodes)
- Can complement/fill gaps that are not (yet) supported by commercial vendors, e.g. niche or new technologies
- Can broaden chip design community, education
- Many open challenges

.....

- PPA quality, maturity
- Support, business model



## **Thank You**

# CHIP HAPPENS



www.chipdesign-germany.de