

This project in ARM is in part funded by ICT-eMuCo, a European project supported under the Seventh Framework Programme (7FP) for research and technological development

orld<sup>®</sup> The Architecture for the

he Digital



Low-Power Leadership





## **MPSoC 2009**

## Targeted execution enabling increased power efficiency

Anirban Lahiri Adya Shrotriya Nicolas Zea

**Technology Researchers** 

John Goodacre Director, Program Management ARM Processor Division

August 2009

The Architecture for the Digital World®



#### Interesting System Configurations...



The Architecture for the Digital World®

## Background



Fig 1a : Power at Peak performance per Operating Voltage



Fig 1b : Power-Performance Diversity of Single Task Workloads

- Smooth transition between energy and performance levels
- Reduced loss due to leakage power as cores can be switched off
- Addresses the application performance diversity



Fig 1c : Diversity of Multitask Workloads

Disclaimer : The plots are indicative of practical architectures and systems.



## **Analyzing Diversity**

- Code compatibility (due to uniform ISA) ensures easy dynamic task migration (Fig 2a)
- Task migration for power efficiency based on required performance (Fig 2b). Example shows a set of tasks T<sub>1 -</sub> T<sub>5</sub>





Fig 2b : Task migrations over time based on performance requirement in a Multitask Workload

Fig 2a : Single task migrating across cores over time

- Prevents smaller tasks from corrupting high performance task execution. E.g. Task T<sub>1</sub> in Fig 2b.
- Important to further analyse temporal effects of SoC power

The Architecture for the Digital World<sup>®</sup>

## **Methodologies Being Utilized**



The Architecture for the Digital World®

ARM

### **Software Model Considerations**

|                                                 | Power Aware SMP                                                   | Big-Switch                                                           |
|-------------------------------------------------|-------------------------------------------------------------------|----------------------------------------------------------------------|
| Level of OS modification                        | Requires affinity to be driven by performance requirement         | Potentially no changes required                                      |
| Maximum power save                              | Can operate as big-<br>switch too                                 | Little and big core need performance continuum                       |
| Level of task diversity<br>and peak performance | Enable better scalability                                         | Limited to performance of single CPU                                 |
| Implementation complexity                       | OS needs a speculative<br>understanding of<br>performance demands | Invisible to OS, operates<br>similar to interrupt<br>service routine |
| Management<br>Responsibility                    | OS performance monitor                                            | Application dependent                                                |
| Flexibility                                     | SMP / AMP designs                                                 | Single CPU only                                                      |

ARM

### **Summary Expectations**

| Application Scenario                         | Po | Power-Aware SMP<br>Scheduled                                     |     |     | Big-Switch                 |   | Big-Core Only                                                                                                                                       | Little Core Only       |
|----------------------------------------------|----|------------------------------------------------------------------|-----|-----|----------------------------|---|-----------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|
| Big-Task<br>(700MIPS)                        |    | 520mW<br>Big-Core (500mW @ 0.8V)<br>Little-Core (20mW Leakage)   |     |     | Big-Core<br>(500mW @ 0.8V) |   | Big-Core<br>(500mW @ 0.8V)                                                                                                                          | -                      |
| Small-Task<br>(350MIPS)                      |    | 250mW<br>Big-Core (50mW Leakage)<br>Little-Core (200mW @ 0.8V)   |     |     | Little-Core<br>(200mW)     |   | Big-Core<br>(500mW)                                                                                                                                 | Little-Core<br>(200mW) |
| 1 Big-Tasks<br>+ 3 Small Tasks<br>(1100MIPS) |    | 700mW<br>Big-Core (500mW @ 0.8V)<br>Little-Core (200mW @ 0.8V)   |     |     | Big Core<br>(750mW @ 1.1V) |   | Big Core<br>(750mW @ 1.1V)                                                                                                                          | -                      |
| 3 Big-Tasks<br>+ 5 Small Tasks<br>(1400MIPS) |    | 950mW<br>Big-Core (750mW @ 1.1\)<br>+ Little-Core (200mW @ 0.8\) |     |     | -                          |   |                                                                                                                                                     | -                      |
|                                              |    |                                                                  |     |     |                            |   |                                                                                                                                                     |                        |
| Operating Voltage (Volts                     | s) | 0.8                                                              | 0.9 | 1.0 | 1.1                        | F | Possible Power                                                                                                                                      | savings up to 50%      |
| Big-Core MIPS at Peak<br>Frequency           |    | 700                                                              | 800 | 950 | 1100                       |   | Performance enhancements up to<br>30% seen by reducing corruption of<br>high performance tasks<br>Key to still understand the costs of<br>migration |                        |
| Little-Core MIPS at Peak<br>Frequency        | (  | 350                                                              | 400 | 450 | 500                        |   |                                                                                                                                                     |                        |
| Big-Core Power at Peak<br>Frequency (mW)     |    | 500                                                              | 575 | 600 | 750                        |   |                                                                                                                                                     |                        |
| Little-Core Power at Pea<br>Frequency (mW)   | k  | 200                                                              | 250 | 300 | 350                        |   |                                                                                                                                                     |                        |

ARM

# Thank you



