

#### **Abstract**

We are seeing the most advanced computing concepts developed for mainframes and PCs over the past 6 decades, mind blowing multimedia & graphics capability and incredibly high speed wireless link coming into the small form factor Smartphone running on scarce energy resources.

The keynote speech will discuss the technical challenges and paradigm shift the industry has to face to drive this fantastic revolution.



#### The Irresistible Rise of Smartphones

Last November, Morgan Stanley Research predicted :





MPSoC '11 Beaune, July 6, 2011

# Smartphones already passed PCs ...

Top Five Smartphone Vendors, Shipments, Market Share, and Year-Over-Year Growth, First Quarter 2011 (Preliminary Results) (shipments in millions of units)

| Vendor             | 1Q11<br>Shipments | 1Q11 Market<br>Share | 1Q10<br>Shipments | 1Q10 Market<br>Share | 1Q11/1Q10<br>Change |
|--------------------|-------------------|----------------------|-------------------|----------------------|---------------------|
| Nokia              | 24.2              | 24.3%                | 21.5              | 38.8%                | 12.6%               |
| Apple              | 18.7              | 18.7%                | 8.7               | 15.7%                | 114.4%              |
| Research In Motion | 13.9              | 14.0%                | 10.6              | 19.1%                | 31.1%               |
| Samsung            | 10.8              | 10.8%                | 2.4               | 4.3%                 | 350.0%              |
| HTC                | 8.9               | 8.9%                 | 2.7               | 4.9%                 | 229.6%              |
| Others             | 23.2              | 23.2%                | 9.5               | 17.1%                | 143.7%              |
| Total              | 99.6              | 100.0%               | 55.4              | 100.0%               | 79.7%               |

Source: IDC Worldwide Quarterly Mobile Phone Tracker, May 5, 2011

Top 5 Vendors, Worldwide PC Shipments, First Quarter 2011 (Preliminary) (Units Shipments are in thousands)

|      |             | 1Q11      | Market | 1Q10      | Market | 1Q11/1Q10 |
|------|-------------|-----------|--------|-----------|--------|-----------|
| Rank | Vendor      | Shipments | Share  | Shipments | Share  | Growth    |
|      |             |           |        |           |        |           |
| 1    | HP          | 15,191    | 18.9%  | 15,624    | 18.8%  | -2.8%     |
| 2    | Dell        | 10,284    | 12.8%  | 10,469    | 12.6%  | -1.8%     |
| 3    | Acer Group  | 9,039     | 11.2%  | 10,733    | 12.9%  | -15.8%    |
| 4    | Lenovo      | 8,172     | 10.1%  | 7,028     | 8.4%   | 16.3%     |
| 5    | Toshiba     | 4,809     | 6.0%   | 4,634     | 5.6%   | 3.8%      |
|      | Others      | 33,062    | 41.0%  | 34,712    | 41.7%  | -4.8%     |
|      | All Vendors | 80,557    | 100.0% | 83,200    | 100.0% | -3.2%     |

Source: IDC Worldwide Quarterly PC Tracker, April 13, 2011



#### From Feature Phones to Smartphones



| TAM Value (\$B)   |      |      | CAGR    |
|-------------------|------|------|---------|
|                   | 2011 | 2016 | 5 Years |
| Handset           | 24   | 29   | 4%      |
| Smartphone        | 12   | 23   | 14%     |
| Connected Devices | 3    | 7    | 22%     |

#### Smartphone unit shipment

| 2010 | 2015 | CAGR |
|------|------|------|
| 291M | 718M | 20%  |

Source: Strategy Analytics, April 2011

Source: ST-Ericsson Q211

Note 1: Core Wireless Handset TAM (Total Available Market) is cellular semiconductor content excluding imaging sensors, memory, optoelectronics, discretes, analog & standard logic

Note 2: Connected Devices include wireless game consoles, USB dongles, tablets, embedded modules in laptops/netbooks, M2M

Smartphones represent more than 60% Core Wireless Semiconductors TAM in 2016



5 MPSoC '11 Beaune, July 6, 2011

#### From Components to Platforms

#### **Component Approach**

OEM designs and/or chooses and assembles various SC components to build mobile devices

#### **Platform Approach**

OEM uses pre-integrated chipset platform solutions to build mobile devices



Source: ST-Ericsson internal estimates based on market and analyst reports

Very few SC companies can sustain platform R&D
... and fewer still know how to optimize wireless platforms



## Convergence





MPSoC '11 Beaune, July 6, 2011

#### Convergence is about ...

- Convergence is not just about the capability to run PC applications and take pictures on a mobile phone ...
- It is actually a whole new way of interacting with the device and with the environment, thanks to mobility and the convergence of high-bandwidth communication, positioning, sensors, and new interfaces
  - "This is the first computer that is context aware, situation aware" Jen-Hsun Huang, NVIDIA CEO, GTC 2010 Keynote









Image: Raygun Studio



#### Convergence Dream ...... and Its Limit









Image: Raygun Studio

- The combination of Sensor Reach Mobile always connected to the Cloud is going to revolutionize our lives!
  - Efficiently connected to the infinite data storage & processing power of the cloud
  - · Mobile and user-friendly Client
- No limits to imagination...
- Well, except for one: POWER...
  - Energy is scarce in a Smartphone
  - Low power is The technology driver
  - CMOS options, HW, SW, all is important

Tablet

Smartphone

Typical LI-On Battery Energy Capacity

50-100 Wh

4 Wh

MPSoC '11 Beaune, July 6, 2011

#### Efficiently Connected ...





• 100X rate increase during the 2010-2020 decade

2000

Delivering 100X more GMAC in almost steady power is the challenge

2010



2020

1990



## **CMOS Technology for Mobile Computing**

- Mobile computing is driving a large part of innovations for new CMOS process
- The need for high performance in a scarce energy budget command new compromise
- "Only comprehensive teamwork between Process Engineers, Circuit Designers, Chip Architects, and Systems Designers can make major advances." (ISSCC 2007 Keynote)



# **CMOS Technology Alliances**







#### ST participates in these alliances:

- PreT0: advanced CMOS technology research
- ISDA: bulk/low-power (LP) technology development
  - 32/28nm
  - 20nm

#### **STMicroelectronics**

13 MPSoC '11 Beaune, July 6, 2011



#### **Transistor Architecture Trends**



16/14nm

15nm FDSOI (ST/LETI VLSI 2010)



Fully Depleted SOI with Ultra Thin BOX and Stressors

11/10nm FinFET (ST/IMEC VLSI 2006)



•FinFETs

## 32/28nm

28nm Low Power (ST/IBM IEDM 2009)

Hybrid UTBB /Bulk

Fully Depleted SOI

with Hybrid Bulk

Gate First
 Metal Gate
 High-K



Bulk w/ enhanced with stressors

22/20nm 25nm FDSOI

stressors

\*2<sup>nd</sup> Generation
MGHK (Gate Last)

\* Improved

junctions

Fully Depleted SOI with Hybrid Bulk

(ST/IBM VLSI 2010)

# To continue Moore's Law, 3 main disruptions are expected at 14nm:

Lithography: moving from 193nm immersion to EUV

Device: moving from bulk CMOS to Thin Silicon devices

3D integration

#### STMicroelectronics 5 2 2

18



# 3D Integration



#### DB8500 Power Management



- Power management is dimensioning the IC design complexity
- Multiple voltage domains and power islands
  - 4 voltage domains
  - 17 switched power islands
- DVFS and AVS
- Forward and Reverse Body Biasing
- Dual Rails SRAM
- Thermal sensor
- Process Monitoring Box
- On chip 'decap' for power integrity
- Small Always On domain



# **Power Integrity**

- Power integrity analysis at platform level
  - SMPS Regulator
  - Board
  - Package





- More passive components embedded into die & package to achieve high clock speed for CPU
- Every mV counts



## **Energy Efficiency Sweet Spot**



#### Mobile Application CPU catch up:



- Clock speed gap narrowing too :
  - Cortex A9 > 1.5GHz now common
  - Cortex A15 in 2GHz-3GHz range are coming

Source: Steven Swanson and Michael Bedford Taylor, University of California

ARM CPU offers now same features as desktop CPU

| Feature         | Benefit                                               |  |
|-----------------|-------------------------------------------------------|--|
| Pipelined       | Higher clock rate                                     |  |
| Superscalar     | Parallelization through multiple instruction dispatch |  |
| SMP Multi Core  | Parallelization through multiple thread dispatch      |  |
| Media Extension | Parallelization through vectorization                 |  |
| Out-of-order    | Reduce idle time                                      |  |
| Prefetch        | Reduce idle time                                      |  |

MPSoC '11 Beaune, July 6, 2011

19



#### Amdahl Law Wall: Quad vs Dual Computing Efficiency



- 2.50
  2.00
  WEB Browsing : reality
  1.50
  1.00
  0.50
  0.00
  2500
  5000
  7500
  10000
  12500
  15000
  Source: ST-Ericsson Benchmark
- Quad core delivers 60% more DMIPS
- Quad Core is 25% more power efficient
- Best Case :

Ideal :

- Quad Core delivers 10% more DMIPS
- Quad Core is marginally more power efficient
- Reality
  - Dual Core delivers 15% more DMIPS
  - Dual Core has better power efficiency



20 MPSoC '11 Beaune, July 6, 2011

#### **SMP SW Exploitation**

- System-level
  - High Level OS are exposing significant levels of concurrency, easily exploited by SMP:
    - Networking, multimedia frameworks, drivers, etc.
  - Multi-tasking increasingly used in Mobile
    - Multiple widgets, multiple concurrent applications
- Multimedia
  - SMP-friendly by nature: data partitioning, function partitioning and pipelining easily achievable
- Web Browsing
  - Today's browser cores are not SMP-friendly, but some important browser evolutions are very SMP friendly ,e.g.
    - Multi-tab, Multi-task, Out-of-Process Plugins
    - HTML5 Web Workers
- In addition, particularly critical applications may benefit from explicit SMP parallelization
  - · Tough, but well known techniques
  - · Low-level SW and multimedia already optimized for SMP



21 MPSoC '11 Beaune, July 6, 2011

## WEB Browsing SMP-Friendly Evolution

- HTML5
  - Multimedia: native support for Canvas and Video make SMP-optimized multimedia integration straightforward
  - WebWorkers: ability to spawn independent JavaScript threads
    - To avoid freezing the browser UI while processing, but also ideal for SMP
- Multi-processing
  - Multi-tab Multi-task initially on Chrome, Mozilla follows, now WebKit2 (see later)
  - Mozilla Out-of-Process plugins
- Parallelization
  - Initial efforts towards SMP optimization started e.g. on WebKit, Mozilla, Google...
    - https://lists.webkit.org/pipermail/webkit-dev/2010-February/011637.html
  - Advanced research programs ongoing, e.g. Berkley parallel browser project and others (<a href="http://parlab.eecs.berkeley.edu/research/80">http://parlab.eecs.berkeley.edu/research/80</a>)

"Our goal is to make every website developer a parallel programmer"







#### **SMP**: Scalability Limiting Factors

- Ideally N CPU's -> N times speedup and lower power than 1 CPU @ N\*Freq
- Scaling is limited by shared HW and SW serialization points
- HW
  - Cache Coherency
  - Resource Sharing
  - Optimized CPU and System HW implementation
- SW Serialization Points
  - Application and/or system SW serialization points have strong impact on achievable scaling
    - Amdahl law
  - ⇒SW
- Extensive system characterization, benchmarking and optimizations required to define the right architecture







23 MPSoC '11 Beaune, July 6, 2011

#### From CPU to APU

• Moore's Law enabler for CPU performance somehow limited

| CMOS Process     | Expect a modest 20% per node                   |
|------------------|------------------------------------------------|
| SMP Multi Core   | Nice scalability but Amdahl Law Wall           |
| CPU Architecture | Catch up done, not much breakthrough to expect |

- Accelerated Processing Unit
  - Expose to programmers the full platform processing power
    - SMP CPU
    - GP-GPU
    - Fixed HW functions
  - Need framework for Heterogeneous APU programming
    - OpenCL



#### **Legacy Architectures**





Computing driven by general purpose programming + 3D gaming

Computing driven by multimedia, and low-power / limitedresources

> ST ERICSSON

25 MPSoC '11 Beaune, July 6, 2011

#### Convergence Architecture





#### ST-Ericsson/PGI Cooperation on GPGPU

- Portland Group Inc (<a href="http://www.pgroup.com">http://www.pgroup.com</a>)
  - Well known in HPC
  - Part of STMicroelectronics since 2000
- PGI GPGPU
  - CUDA-x86, OpenCL-ARM, Accelerator
- PGI Accelerator

(http://www.pgroup.com/resources/accel.htm)



- Pragma based set of directives (OpenMP-like) for C and Fortran
- Automatic analysis of whole program structures and data, splits portions of the application into CPU and GPU as specified by user directives
- Optimized mapping of loops to automatically use the parallel cores, hardware threading capabilities and SIMD vector capabilities of modern GPUs
- Directives for fine-grained control over the mapping of loops, allocation of memory, and optimization for the GPU memory hierarchy
- Automatic generation of CPU/GPU data movement



27 MPSoC '11 Beaune, July 6, 2011

#### Programming Challenge

- Re-writing Existing applications in CUDA or OpenCL can be a lot of work
  - 1. Split code between Host and GPU
  - 2. Manage data allocation & movement between Host and GPU memories

Mechanical

3. Tune GPU Kernel Schedules and Memory Usage





#### More HW Features to ease SW Programming

- System MMU
  - Virtualization of the address space for multimedia clients, no more constraining physical space reservation
- Hardware Virtualization
  - Multiple OSs and services efficiently and safely sharing reources without the burden of "para virtualization"
- Hardware Coherency
  - No more explicit cache management



29 MPSoC '11 Beaune, July 6, 2011

## Video & Display Resolution Driving Memory BW





#### After LPDDR2: LPDDR3 and WidelO...



ST ERICSSON

31 MPSoC '11 Beaune, July 6, 2011

# 3D Partitioning Strategies for Application Processors

## 1 .Conventional die stacking

- Application board form factor and package cost reduction
- Move test ownership from application (board) to chip supplier (package)



# 2.Wide-IO / Silicon Interposer

- Massive die interconnect
- Increase memory access bandwidth
- focus on power and performance

# Wide IO memory Host

# 3. Split by process technology

- Implement functions in best suited silicon process technology
- Focus on cost and time to market





#### **Open Standards**

- · Smartphone platform heavily relies on industry standard
- With it's parent companies, ST-Ericsson has a strong position and is leading standardization in many areas









ST ERICSSON

33 MPSoC '11 Beaune, July 6, 2011

#### ST-Ericsson High-Performance Application Processor



ST ERICSSON

#### Conclusion

- A unique combination of sensors and high performance/low power computing always connected to the Cloud is going to revolutionize our lives. Smartphones becomes effectively an extension to our senses and brain.
- Very few companies can drive and combine all the necessary technologies
  - Cellular Modem, APU & Computing, Sensors, IC Packaging, Memories, HLOS, CMOS Process
- Low Power is the fundamental driver for innovation
- Smartphones is the new big driver for the semiconductor industry and ST-Ericsson with Its parent companies is in a unique position to take a leading position on this market

ST ERICSSON

35 MPSoC '11 Beaune, July 6, 2011

