

## Virtual prototyping acceleration on manycore architectures

N. VENTROUX CEA LIST Computing and Design Environment Lab CEA-Saclay NanoInnov <u>nicolas.ventroux@cea.fr</u>

| Ceatech Introduction                                                                                                                                                                                                                                                                                         |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>SoCs are becoming more and more complex</li> <li>Complexity in a chip is increasing x1.6 every 2 years (ITRS 2013)</li> <li>e.g. Apple A8 SoC owns 2B transistors</li> <li>Development costs/TTM must be contained</li> </ul>                                                                       |
| <ul> <li>Virtual prototyping can</li> <li>Improve design quality and performances with fast design exploration</li> <li>Performance, power, temperature, reliability</li> <li>Provide a virtual HW platform to SW developpers</li> <li>Parallelize design phases and increase design productivity</li> </ul> |
| <ul> <li>Limitation of VP</li> <li>Simulation performance does not scale with complexity (at constant accuracy)</li> </ul>                                                                                                                                                                                   |
| Inevitable need for VP acceleration                                                                                                                                                                                                                                                                          |
| eti & List DACLE Division 2                                                                                                                                                                                                                                                                                  |







| Ceatech Related works                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>Distributed SystemC (conservative and optimistic)</li> <li>Use implicit communications (FIFO), synchronization or lock mechanisms or avoid shared variables</li> <li>Works if few communications between clusters</li> <li>Need to clusterize the simulated architecture</li> <li>Modification of the simulated architecture to explicit the parallelism</li> <li>Approaches:         <ul> <li>Multiple SystemC schedulers</li> <li>Conservative approaches [ZIYU09, CHOP06, CHUN14]</li> <li>With speculation or relaxing methods [MEL10, COMB08, PESS10]</li> <li>Synchronizations on every communication, null messages</li> </ul> </li> <li>Distributed simulation on top of SystemC [HUAN08]</li> </ul> |
| <ul> <li>Parallel SystemC kernel (conservative)</li> <li>Use of a global synchronization (at each δ-cycle)</li> <li>Execute the evaluation phase in parallel</li> <li>Load-balancing techniques (work-stealing, work-sharing)</li> <li>Approaches         <ul> <li>Not SystemC compliant [EZUD09, SCHU10, KHAL10]</li> <li>Shared variable access atomicity is left to the user</li> <li>Process order is not guaranteed</li> <li>SystemC compliant</li> <li>Run processes in parallel though they do not run at the same time [MOY14, CHEN12]</li> <li>Partition simulation and preserve co-routine semantics within a partition [SCHU13]</li> </ul> </li> </ul>                                                     |
| e cz. zd rights reserved DACLE Division 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |







| ceat                                                               | Bibliography                                                                                                                                                                                                                                                                                  |  |
|--------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| SystemC acceleration (static scheduling, parallel and distributed) |                                                                                                                                                                                                                                                                                               |  |
| [BUCH07]                                                           | R. Buchmann and A. Greiner, A Fully Static Scheduling Approach for Fast Cycle Accurate SystemC Simulation of MPSoCs, ICM 2007.                                                                                                                                                                |  |
| [MOU04]                                                            | G. Mouchard, D. Gracia Pérez and O. Temam, FastSysC: A Fast Simulation Engine, DATE 2004.                                                                                                                                                                                                     |  |
| [NAG07]                                                            | Y.N. Naguib et R.S. Guindi, Speeding up SystemC simulation through process splitting, DATE 2007.                                                                                                                                                                                              |  |
| [CHEN12]                                                           | W. Chen, X. Han and R. Dömer, Out-of-order parallel simulation for ESL design, DATE 2012.                                                                                                                                                                                                     |  |
| [BLAN10]                                                           | N. Blanc and D. Kroening, Race analysis for SystemC using model checking, TODAES, vol. 15, no. 3, 2010.                                                                                                                                                                                       |  |
| [CHOP06]                                                           | B. Chopard, P. Combes, et J. Zory, A Conservative Approach to SystemC Parallelization, ICCS 2006.                                                                                                                                                                                             |  |
| [MEL10]                                                            | A. Mello, I. Maia, A. Greiner, et F. Pecheux, Parallel Simulation of SystemC TLM 2.0 Compliant MPSoC on SMP Workstations, DATE 2010.                                                                                                                                                          |  |
| [ZIYU09]                                                           | H. Ziyu et al., A Parallel SystemC Environment: ArchSC, ICPADS 2009.                                                                                                                                                                                                                          |  |
| [COMB08]                                                           | P. Combes, E. Caron, and B. Chopard, Relaxing Synchronization in a Parallel SystemC Kernel, ISPA 2008.                                                                                                                                                                                        |  |
| [PESS10]                                                           | I. Pessoa, A. Mello, A. Greiner and F. Pecheux, Parallel TLM Simulation of MPSoC on SMP Workstation: Influence of Communication Locality, ICM 2010.                                                                                                                                           |  |
| [EZUD09]                                                           | P. Ezudheen et al., Parallelizing SystemC kernel for fast hardware simulation on SMP machines, PADS 2009.                                                                                                                                                                                     |  |
| [SCHU10]                                                           | C. Schumacher, R. Leupers, D. Petras and A. Hoffmann, parSC: Synchronous Parallel SystemC Simulation on Multi-Core Host Architectures, CODES+ISSS 2010.                                                                                                                                       |  |
| [KHAL10]                                                           | R.S. Khaligh and M. Radetzki, A dynamic load balancing method for parallel simulation of accuracy adaptive TLMs, FDL 2010.                                                                                                                                                                    |  |
| [HUAN08]                                                           | K. Huang, I. Bacivarov, F. Hugelshofer and L. Thiele, Scalably distributed SystemC simulation for embedded applications, ISIES 2008.                                                                                                                                                          |  |
| [MOY13]                                                            | M. Moy, Parallel Programming with SystemC for Loosely Timed Models: A Non-Intrusive Approach, DATE 2013.                                                                                                                                                                                      |  |
| [CHUN14]                                                           | MK. Chung, JK. Kim, and S. Ryu, SimParallel: A High Performance Parallel SystemC Simulator Using Hierarchical Multi-threading, ISCAS 2014.                                                                                                                                                    |  |
| [SCHU13]                                                           | C. Schumacher et al., legaSCi: Legacy SystemC Model Integration into Parallel SystemC Simulators, IPDPSW 2013.                                                                                                                                                                                |  |
| [WEIN14]                                                           | J.H. Weinstock et al., Time-Decoupled Parallel SystemC Simulation, DATE 2014.                                                                                                                                                                                                                 |  |
| [PEET11]                                                           | J. Peeters, N. Ventroux, T. Sassolas, A SystemC TLM Framework for Distributed Simulation of Complex Systems with Unpredictable Communication, DASIP 2011.                                                                                                                                     |  |
| HW SystemC acceleration                                            |                                                                                                                                                                                                                                                                                               |  |
| [SINH12]                                                           | R. Sinha, A. Prakash and H.D. Patel, Parallel Simulation of Mixed-abstraction SystemC Models on GPUs and Multicore CPUs, ASP-DAC 2012.                                                                                                                                                        |  |
| [KUNZ12]                                                           | G. Kunz, D. Schemmel, J. Gross and K. Wehrle, Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs, PADS 2012.                                                                                                                                     |  |
| [VINC12]                                                           | S. Vinco, D. Chatterjee, V. Bertacco and F. Fummi, SAGA: SystemC Acceleration on GPU Architectures, DAC 2012.                                                                                                                                                                                 |  |
| [NAN10]                                                            | M. Nanjundappa, H.D. Patel, B.A. Jose, et S.K. Shukla, SCGPSim: A fast SystemC simulator on GPUs, ASP-DAC 2010.                                                                                                                                                                               |  |
| [VENT14]                                                           | N. Ventroux, J. Peeters, T.Sassolas and James C. Hoe, Highly-Parallel Special-Purpose Multicore Architecture for SystemC/TLM Simulations, SAMOS 2014.                                                                                                                                         |  |
| Virtual prototyping solutions                                      |                                                                                                                                                                                                                                                                                               |  |
| [VENT13]                                                           | N. Ventroux, T. Sassolas, A. Guerre and C. Andriamisaina, SESAM: A Virtual Prototyping Solution to Design Multicore Architectures, Chapter in Multicore<br>Technology: Architecture, Reconfiguration, and Modeling, edited by M.Y. Qadri and S.J. Sangwine, CRC PRESS, pp. 61-104, July 2013. |  |
| [HELM08]                                                           | C. Helmstetter and V. Joloboff, SimSoC: A SystemC TLM integrated ISS for full system simulation, APCCAS, December 2008.                                                                                                                                                                       |  |
| leti & li                                                          | © CEA. All rights reserved DACLE Division 10                                                                                                                                                                                                                                                  |  |

