



# Will HPC be a next decade disruptor, or will it be disrupted?

Eric Monchalin Chair of the European Processor Initiative Vice President at Eviden, Head of Machine Intelligence 09/07/2024



# the uncertain certainties

1945 1977 1980

> European Processor Initiative

Thomas J.Watson (CEO of IBM) « World market for may be five computers »

Ken Olson (CEO of DEC) « No reason for anyone to have a computer at home »

IBM study *« Only about 50 Cray-1 class computers will be sold per year »* 





The views expressed on the following slides are those of the presenter.

They do not necessarily represent any view of Eviden, EPI or EuroHPC JU organizations, affiliates or employees.







**1** What's numerical simulation?

**2** The ground truth

**3** What's at stake?

**4** The threats

# **5** The weaknesses



**6** Cloud convergence

**7** Making HPC future a reality

**8** The European initiatives

**9** Conclusion





# 1 What's numerical simulation





# **Continuum to discretization**

Derivative at  $t_i$ 



Smaller the time stamps, more accurate the simulation and heavier the computation





# Mathematical Model to Numerical Solution

$$F_{u} \cong -cv$$

$$F_{d} + \frac{dv}{dt} = \frac{dv}{dt}$$

$$F_{d} = mg$$

$$\frac{dv}{dt} \cong \frac{\Delta v}{\Delta}$$

$$\frac{dv}{dt} \cong \frac{\Delta v}{\Delta}$$

$$\frac{dv}{dt} \cong \frac{\Delta v}{\Delta}$$

$$\frac{dv}{dt} \cong \frac{\Delta v}{\Delta}$$

European Processor Initiative

ep

$$F_{d} + Fu = m\gamma \Rightarrow mg - cv = m\frac{dv}{dt}$$
$$\frac{dv}{dt} = g - \frac{c}{m}v$$
  
Analytical:  $v(t) = \frac{mg}{c} (1 - e^{-\frac{c}{m}t})$ 
$$\frac{v}{t} \approx \frac{\Delta v}{\Delta t} = \frac{v(t_{i+1}) - v(t_{i})}{t_{i+1} - t_{i}} = g - \frac{c}{m}v(t_{i})$$
$$f_{1} = v(t_{i}) + \left[g - \frac{c}{m}v(t_{i})\right](t_{i+1} - t_{i})$$



7

# Decompose the compute domain to parallelize the computation





Parallel computation on mesh elements

And

Surface data exchanges with neighbors



# Refine a 3D mesh by a factor of 10 on each axis

- X 1,000 mesh elements
- > 15 years of technology enhancements for the same solving budget





# Almost a regular execution scheme

# 1<sup>st</sup> Mesh construction



Few CPUs Large Memory



# 3<sup>rd</sup> Result saving





Communication throughput Sequential FS writes



# The harsh reality of the Amdahl's maw Any sequential code or computing / communication delay matters









# **2** The ground truth





# Software is eating the world (Marc Andreessen) Yet hardware is still shaping it



Processor Initiative

ed

#### Consumption



#### Provision



# May be still few isolated Jurassic worlds A pure construct of my mind or a reality for serious reasons?







# The divergences of the microelectronics performances

European Processor Initiative

epi



MPSoC'24 EVIDEN

14

# Supremacy of data generation over computation

Large Hadron Collider Tens of Petabytes per year



Advanced Light Source 7 Terabytes per hour



Square Kilometer Array 400 exabytes per year



#### Genomics Exabytes per



Climate modeling 400 Petabytes per year



Data volumes are growing far beyond





The need of leading edge HPC applications is far broader then ever



# New era of the cooperation in compute and data continuum

epi





# **3** What's at stake?





# Tackle research, economic, industry, and societal challenges

## Anticipate climate changes



### Care for aging population





#### Study earthquakes



## Innovate without limit



## **Control epidemics**



#### Secure energy resources





18

# Who knows what's HPC and numerical simulation?







# Keeping HPC so confidential is at risk











# **4** The threats





Gen Z moto: You Only Live Once

Initiative

eD





## Make scientific education and jobs attractive again







# Foreseen mismatch between electricity supply and demand

2020

91%

Impact of energy on global CO<sub>2</sub> emissions

2030

Increase in demand for electricity



Energy that is green Electricity



European Processor Initiative

eD

Electricity weight of Information & Communication Technology

20%

+50%

Electricity weight of Information & Communication Technology



## Energy, a sensitive subject

# The paper that forced Timnit Gebru out of Google in 2020

| in lbs of CO2 equivalent                                                                                  | footprint b            | enchm | ark: | 5 |  |
|-----------------------------------------------------------------------------------------------------------|------------------------|-------|------|---|--|
| Roundtrip flight b/w NY and SF (1 passenger)                                                              | 1,984                  |       |      |   |  |
| Human life (avg. 1 year)<br>American life (avg. 1 year)                                                   | [] 11,023<br>[] 36,156 |       |      |   |  |
| US car including fuel (avg. 1 lifetime)<br>Transformer (213M parameters) w/ neural<br>architecture search | 126 000                | 1     |      |   |  |

Chart: MIT Technology Review + Source: Strubell et al. + Created with Datawrapper Strubell's study found that one language model with a particular type of "neural architecture search" (NAS) method would have produced the equivalent of 626,155 pounds (284 metric tons) of carbon dioxide—about the lifetime output of five average American cars. A version of Google's language model, BERT, which underpins the company's search engine, produced 1,438 pounds of CO2 equivalent in Strubell's estimate—nearly the same as a roundtrip flight between New York City and San Francisco.

> European Processor Initiative

ed

#### Date of Energy Carbon original consumption footprint (lbs **Cloud compute** paper (kWh) of CO2e) cost (USD) Transformer (65M Jun, 2017 27 26 \$41-\$140 parameters) Transformer (213M Jun, 2017 201 parameters) 192 \$289-\$981 ELMo Feb. 2018 275 262 \$433-\$1,472 BERT (110M Oct, 2018 parameters) 1,507 1,438 \$3,751-\$12,571 Transformer (213M parameters) Jan, 2019 W/ neural 656,347 architecture \$942,973-626,155 \$3,201,722 search GPT-2 Feb, 2019 \$12,902-Note: Because of a lack of power draw data on GPT-2's training hardware, the researchers weren't able to calculate its Table: MIT Technology Review • Source: Strubell et al. • Created with Datawrapper

The estimated costs of training a model





Example: Amazon data center (960 MW) directly connected to Susquehanna Steam Electric Station (2.5 GW)







# **5** The weaknesses





The most secret number of every HPC datacenter

# Percentage of data generated, archived and never exploited

- Compute time if often free of charge for users
- More application runs than time and tooling to explore the results
- Applications not behaving well and not stopped earlier enough
- > Mis of old data garbage collectors





# From Flops to Bytes per Flop

Direct solver Dense Matrix O(N<sup>3</sup>)



Banded Matrix O(N<sup>2.7</sup>)

Direct solver



Linpack benchmark

Dense blocks n<sup>2</sup> memory access - n<sup>3</sup> computation **Core intensive**  Iterative solver (CG) Sparse Matrix O(N<sup>1,5</sup>)



**HPCG** benchmark

Iterative solver (AMG) Sparse Matrix O(N.log(N)



Spectral solver (FFT) Regular grid O(N.log(N)

Sparse data n memory access - n computation **Memory intensive** 





# Lack of memory throughput

| Top 5 2023 06 | Tflop/s peak | Linpack efficiency | HPCG efficiency |
|---------------|--------------|--------------------|-----------------|
| Frontier      | 1 679 819    | 71%                | 0,8%            |
| Fugaku        | 537 212      | 82%                | 3,0%            |
| LUMI          | 428 704      | 72%                | 0,8%            |
| Leonardo      | 304 466      | 78%                | 1,0%            |
| Summit        | 200 795      | 74%                | 1,5%            |

|                 | FP64 (Tflop/s) | Bandwidth (TW*/s) | BW/FP64 |
|-----------------|----------------|-------------------|---------|
| NVIDIA A100     | 9.70           | 0.485             | 5%      |
| Millan 64C 2Ghz | 2.05           | 0.051             | 2.5%    |





# So energy expensive memory throughput

| Operation          |         | Picojoules per operation |              |       |  |
|--------------------|---------|--------------------------|--------------|-------|--|
|                    |         | 45 nm                    | 7 nm         | Ratio |  |
| Add.<br>(IEEE)     | FP 16   | 0.4                      | 0.16         | 2.5   |  |
|                    | FP 32   | 0.9                      | 0.38         | 2.4   |  |
| Mult.<br>(IEEE)    | FP 16   | 1.1                      | 0.34         | 3.2   |  |
|                    | FP 32   | 3.7                      | 1.31         | 2.8   |  |
| SRAM<br>64b access | 32 KB   | 20                       | 8.5          | 2.4   |  |
|                    | 1 MB    | 100                      | 14           | 7.1   |  |
| DRAM<br>64b access |         | Circa<br>45 nm           | Circa<br>7nm |       |  |
|                    | DDR 3/4 | 1300                     | 1 300        | 1.0   |  |
|                    | HBM2    | 250-450                  |              |       |  |

7nm - Picojoules per operation (log<sub>10</sub> scale)



Source: Ten Lessons From Three Generations Shaped Google's TPUv4i





# The uncertain future of monolithic chips



# the irresistible rise in energy consumption

European Processor Initiative

epi



Source: medium.com (AI and Memory Wall)



## Gate expensive FP64 computation not so useful for AI



European Processor Initiative

edi

NVIDIA GPU performances

More expensive than ever Eflops Linpack in the coming years?



# Flops are isolated, not IOs that are polluted by workflow concurrencies



eD



## **6** Cloud convergence





#### Four flavors of Cloud Computing

ed

| Process                             | Business Process Outsourcing<br>(BPO) | process                       |
|-------------------------------------|---------------------------------------|-------------------------------|
| Consume                             | Software as a Service<br>(SaaS)       | Application Data              |
| Build                               | Platform as a Service<br>(PaaS)       | Middleware<br>OS              |
| Host                                | Infrastructure as a Service<br>(IaaS) | Compute<br>Storage<br>Network |
| European<br>Processor<br>Initiative |                                       | ^-^                           |



38

#### One Cloud doesn't fit all



European Processor Initiative



#### **Cloud computing tradeoffs**

#### Benefits

European Processor Initiative

- Pays as you use (CAPEX to OPEX)
- Easily scale resources up or down
- Faster time to market
- Access resources and applications from anywhere
- Advanced security
- Data loss prevention
- Seamless collaboration & data sharing

#### Limitations

- Risk of vendor lock-in
- Unforeseen costs, might be more expensive
- Relies on an internet connection
- Less control over underlying cloud infrastructure
- Concerns about security risks like data privacy and online threats
- Integration complexity with existing systems



HPC has started to jump in cloud arena



European Processor Initiative

eD





# ECLAIRION







41

#### Some signals to move one step further







#### Fiscal year 2022, R&D investments (\$Billions)

Sources: statista.com HPE Fiscal Year 2022 Form 10-K



#### Ultra domination of top Cloud providers











#### Parallel file system as a service, a cloud opportunity



#### Application workflow pressure on file system



45

EVIDEN



#### Dataflow nodes in action: write elapsed time divided by 2





# 7 Making HPC future a reality





#### The basic ingredients







#### I had a dream





#### End to end co-design

Performance simulation

HW & SW architecture abstraction

Whole ecosystem alliances

New algorithms & programming models

Societal & business challenges Proxy apps

co-design co-design Chips to

European Processor Initiative Accelerate Time-to-Market

Mitigate Technical Risk

Optimize Design efficiency

Leverage Rapidly Evolving Technologies



#### From free lunch to responsible computing and data generation





Data life cycle management







#### **Relieve FP64 sensitivity**

European Processor Initiative

epi

# So useful generated data?

### Which computation from FP64 to FP32, FP16, ....?



# FP code profiler





#### Work around the Babel tower of languages

epi





#### Gain the freedom to choose your accelerator



#### 2024

Still challenging to program GPUs Numerous codes not GPU compatible yet Strong dependency from Nvidia CUDA API

> Processor Initiative

eD

# E "Standardized" programming API for accelerators



#### Unlash memory throughput for an efficient computing

|             | Package BW<br>(GB/s)  | Power efficiency<br>(pJ/b) |
|-------------|-----------------------|----------------------------|
| HBM2        | 410                   | 5                          |
| Darpa PIPES | 100,000 <b>(244x)</b> | 1 <mark>(5x)</mark>        |
| AYAR LABS   | 32,000 <b>(78x)</b>   | 1 <mark>(5x)</mark>        |



Initiative

eD



56

TELP

Compute to

connections

#### Make tailor-made HPC HW affordable

Common chiplet platform architecture
Mechanical, electrical, protocols
Published Interface specifications
Test suite for validation/certification
Likely foundry process agnostic
Market place

Fiber array Photomic interpose photomic interpose photomic interpose package substrate

> European Processor Initiative

eD







Superconducting drastically downsize the power consumption

Goal of Imec SCD (Super Conducting Digital) Project: 4 PFLOPs/cm<sup>2</sup> @ 120mW (cold) / 38W (wall plug)

|               | CMOS 7nm                      | Imec SCD 28nm |  |
|---------------|-------------------------------|---------------|--|
| Speed         | 2 GHz                         | 15x           |  |
| Memory        | 500 MB/cm <sup>2</sup> (SRAM) | am) 0.01x     |  |
| Interconnects | 1.6 GB/line @ 1 pJ/b          | 1000x         |  |
| Power         | 1 TOPs/W                      | 100x          |  |



Data center in a shoe box:
▶ 100 blades
▶ 150x320x100mm



Sources: Imec lecture, SC23 superconducting-digital-technology-revolutionize-ai-and-machine-learning-roadmap (Imec)



58

#### Classical/Quantum computing convergence



Datacenter on a wafer

Affordable cryocooler







Processor



# 8 The European initiatives



#### Achieve autonomy in strategic processing technologies



#### Two pillars to make European ambition a reality





Empowering Europe's Semiconductor Future, uniting innovation and driving Progress



Leading the Way in European Supercomputing, developing a World Class Supercomputing Ecosystem



#### EU back in the race with EuroHPC JU

Deploy Innovate Value Processo Initiative

Develop, deploy, extend & maintain a world-leading supercomputing, quantum computing, service & data infrastructure ecosystem in Europe

Support the development of innovative supercomputing components, technologies, knowledge & applications to underpin a competitive European supply chain

Widen the use of HPC & quantum infrastructures to a large number of public & private users wherever they are located in Europe and supporting the development of key HPC skills for European science and industry



#### Fuel European ambition (2021-2027)



\*Member states to match this with national contributions





#### Supercomputer deployment



EuroH

Joint Undertaking

| NOV 2023      | TOP500 | Green500    |
|---------------|--------|-------------|
| LUMI          | #5     | #7          |
| LEONARDO      | #6     | #18         |
| MARENOSTRUM 5 | #8     | #6          |
| MELUXINA      | #71    | #27         |
| KAROLINA      | #113   | #25         |
| DISCOVERER    | #166   | #216        |
| VEGA          | #198   | #253        |
| Underway      | Year   | Performance |
| JUPITER       | 2024   | Exascale    |
| DEADELUS      | 2024   | Mide-range  |
| JULES VERNES  | 2025   | Exascale    |



#### Quantum deployment

European Processor Initiative

ed



Agreements with Six hosting entities

2 quantum simulators (100+ qubits) in
> Joliot Curie (GENCI / France)
> Juwels (JFZ / Germany)

Two procurements in progress
EuroQCS-Poland (PSNC / Poland)
Euro-Q-Exa (LRZ / Germany)

• Call in progress for 2 quantum Excellence Centers



66

#### Federate EuroHPC systems (2023+)

Authentication, Authorization and Identification services (AAI)

Computing servicesInteractive Computing

Cloud access – Virtual Machines - Containers

#### Data services

Archival Services and Data repositories

Data mover / transport services

User and Resource management







#### Infrastructure roadamp



#### **Areas of Strategic Research & Innovation**





#### (Partial) system technology roadmap



#### CPU and accelerator, a 400M€ Roadmap

European silicon proven IP

Flexible chiplets

Ready for state of the art AI

Deployment-ready through pilots

> European Processor Initiative

First IPs (2024-2026):
➢ Build on EPI efforts on ARM-based processor
➢ From test chips to TRL 9
➢ EuroHPC exascale systems as first customer

New architectures (2026-2028)
Stand-alone competitive processors & accelerators
Building on EU R&D in low power, security,...
EuroHPC post-exascale system as first customer

Post-exascale(2028-) ➤ ARM/RISC-V systems based on EU R&D and IPs



#### A special attention to education and trainings





\* \* \* \* \* \* PRACE \* \* \*

One-stop shop for HPC training offers of 33 National Competence Centers Pan-European Master (2 years) for HPC 14 Training Centres with ~100 training events each year

#### Coming: virtual HPC academy





72



# **9** Conclusion





#### They did not know it was impossible... So they did it" (Mark Twain)

Atos







epi

© European Processor Ini

OBUISeguana XH

Atos







#### We are still moving data to compute!









#### we tend to shy away from simple and obvious solutions



















eric.monchalin@eviden.com

Confidential information owned by the EPI consortium, to be used by the recipient only. This document, or any part of it, may not be reproduced, copied, circulated and/or distributed nor quoted without prior written approval from the EPI consortium.

