

## A 460MHz@397mV – 2.6GHz@1.3V 32bit **VLIW DSP Embedding F<sub>MAX</sub> Tracking**

E. Beigné, A. Valentian, I. Miro-Panades, R. Wilson, P. Flatresse CEA-LETI, MINATEC, Grenoble, France STMicroelectronics, Crolles, France



Ref: R. Wilson et al., 'A 460MHz at 397mV, 2.6GHz at 1.3V, 32b VLIW **DSP, Embedding FMAX Tracking**', **ISSCC** 2014



MINATEC CAMPUS

# **Application Context** Wide Operating Range

- Today's wireless internet applications:
  - Multimedia, gaming, GPS...
  - High speed > 1.5GHz
  - Large range voltage operation
  - Low standby power



- Challenges :
  - Ultra Wide Voltage Range (UWVR) : Versatility
  - Very high speed at nominal and low supply voltages
  - Low energy



# **DSP** architecture



# **Chip Micrograph**



| Technology<br>STMicro | UTBB FDSOI<br>28 nm       |  |
|-----------------------|---------------------------|--|
| Transistors           | Flip-Well (LVT)<br>L=24nm |  |
| Core area             | 1 mm²                     |  |
| DSP<br>benchmark      | FFT 1024                  |  |
| VDD range             | 0.397V-1.3V               |  |
| VBB range             | 0V/±2V                    |  |

- UTBB FDSOI 28 nm technology
  - Well-type methodology
  - Body Biasing techniques
- Optimized library sub-set
  - Assymetric standard cells
  - High performance flip-flops
- F<sub>MAX</sub> tracking techniques
  - Replica path and Timing fault detection
- DSP performances silicon results

- UTBB FDSOI 28 nm technology
  - Well-type methodology
  - Body Biasing techniques
- Optimized library sub-set
  - Assymetric standard cells
  - High performance flip-flops
- F<sub>MAX</sub> tracking techniques
  - Replica path and Timing fault detection
- DSP performance silicon results



## **VT flavor choice - Well type**

 UTBB FDSOI Low-VT transistors chosen for maximum speed at low voltage : Flip Well FDSOI



⇒ DSP core  $V_{DDMIN}$  ≥ 397mV ⇒ Clock frequency 7 +400%@500mV and +100%@1.3V

# **Power distribution – Body Biasing**



Including all biasing voltages

- Thin VBB power stripes
- Body-Biasing
  - Supplied through

(± 2V)

- UTBB FDSOI 28 nm technology
  - Well-type methodology
  - Body Biasing techniques
- Optimized library sub-set
  - Assymetric standard cells
  - High performance flip-flops
- F<sub>MAX</sub> tracking techniques
  - Replica path and Timing fault detection
- DSP performances silicon results

# **Optimized library sub-set**

- Use of available standard cells optimized for low voltage:
  - Extend the subset to the Wide Voltage Range
- Gate length modulation done through Poly Biasing
- SPICE simulations and silicon measurements of qFO4 ring oscillators
- Keep the power-delay optimal cells in UWVR
- Cells are characterized over
  [0.275V-1.4V] V<sub>DD</sub> range and [0V- (±2V)] V<sub>BB</sub> range



# **Optimized Pulsed-latch FF**

- Transmission-Gate Pulsed Latches
  - A latch and a pulse generator
  - Same behavior as a conventional Master-Slave FF
  - Proved to be the most energy-efficient in a large portion of the design spacee

|             | D-to-Q<br>(ps)       | Egy/cycle<br>(fJ)   | Area<br>(µm²)       |
|-------------|----------------------|---------------------|---------------------|
| ST C2MOS    | 117.5                | 6                   | 4,41                |
| ST SA       | 46.5 (- 60 %)        | 12.5 (+ 208 %)      | 6,85 (+ 55 %)       |
| TGPLMuxScan | 30.5 <b>(- 74 %)</b> | 7.2 <b>(+ 20 %)</b> | 4,73 <b>(+ 7 %)</b> |
| TGPLMuxClk  | 26 (- 78 %)          | 9.1 (+ 56 %)        | 5,06 (+ 14 %)       |

- UTBB FDSOI 28 nm technology
  - Well-type methodology
  - Body Biasing techniques
- Optimized library sub-set
  - Assymetric standard cells
  - High performance flip-flops
- F<sub>MAX</sub> tracking techniques
  - Replica path and Timing fault detection
- DSP performances silicon results



# Timing margins reduction in UWVR F<sub>MAX</sub> tracking

- Design in *typical* corner case instead of worst case approach
  - Need to compensate for PVT variations in UWVR
  - Reduce energy per operation ... or increase clock frequency
- Track the optimal Frequency/Voltage/Biasing point
  - Real maximum frequency is **not** available
  - Functional in the UWVR / Low area and power overhead
- Two complementary solutions proposed on-chip
  - Replica path based ⇒ CODA (ClOning DAta paths)
  - Timing slack sensor based ⇒ TMFLT (TiMing FauLT)

# **F**<sub>MAX</sub> tracking : CODA

- 16 pairs of clones of Representative Critical Paths (RCP)
  - One RCP is used as a canary path (delay monitor)
  - Another RCP used as a loop (oscillation frequency monitor)



# **F**<sub>MAX</sub> tracking : CODA

- 16 pairs of clones of Representative Critical Paths (RCP)
  - One RCP is used as a canary path (delay monitor)
  - Another RCP used as a loop (oscillation frequency monitor)



# **Fmax tracking : CODA**

- Correlation between F<sub>MAX</sub> and frequency predictions
  - 5 slots into 21 dies @ 1V & 25°C (+4/-3% accuracy)
  - 1 slot into 21 dies @ [0.8V-1.3V] & 25°C (6% accuracy)
  - Not intrusive / Need complementary solution in UWVR



# F\_MAXtracking in UWVR : TMFLT128TMFLT-Sensors11TMFLT-Ring

- Analyze the registers slack time
- 300ps detection window

#### Programmable replica path

• Time-to-digital measurement



# FMAXtracking in UWVR : TMFLT128TMFLT-Sensors11TMFLT-Ring

• Instrumented data path are **not** necessarily critical

#### Programmed with user IR drop margins



# **TMFLT results**

 Frequency estimation error compared to F<sub>MAX</sub>
 – Error of ±4% at 1V



- Compared to worst case PVT corner (3σ) approach without F<sub>MAX</sub> tracking:
  - Energy gain of 40.6% @ 0.6V
  - Frequency gain of 24% @1V

- UTBB FDSOI 28 nm technology
  - Well-type methodology
  - Body Biasing techniques
- Optimized library sub-set
  - Assymetric standard cells
  - High performance flip-flops
- F<sub>MAX</sub> tracking techniques
  - Replica path and Timing fault
- DSP performances silicon results



## **DSP speed performance measurements**



- FBB up to +2V and F<sub>MAX</sub> tracking :
  - − ¬ frequency up to 460MHz at minimum voltage

# **DSP energy performance measurements**



Lowest energy at 460MHz

leti

Power consumption : 370mW at 1V

# **DSP energy performance measurements**



## **Comparison with State of the Art WVR**



# Conclusion

- Very high speed and low energy 28 nm DSP achieved by the efficient use of F<sub>MAX</sub> tracking techniques combined with optimized libraries in UTBB FDSOI technology
- Convergence between High Performance and Low Power is possible by using innovative Ultra Wide Voltage Range design techniques
- New features for increased Versatility in future integrated circuits