

## **Smart Sensor Systems**

The Next Multiprocessor Challenge

### MPSoC2013, Otsu, Japan

Dr. Yankin Tanurhan VP Engineering, Solutions Group Synopsys, Inc.

16/7/2013

## Agenda

Introduction

Smart Sensor Systems

Components

Performance

Conclusions

## Introduction

#### Analog sensors meet digital and software

© Synopsys 2013 3

SYNOPSYS<sup>®</sup> Accelerating

## Introduction

- Smart Sensor: adding a (digital) CPU core to an analog sensor
- The second secon

- Digital processing
  - Digital filtering
  - Calibration, linearization
  - Digital interfaces
  - Sensor fusion



SYNOPSYS<sup>®</sup> Accelerating Innovation



## **Significant Growth in Sensors**

## Introduction

Topics addressed in this presentation

- · Smart sensor systems
  - Analog sensor becomes intelligent sensor
  - Multitude of sensors (analog and digital) combined for more precise data
  - Sensor fusion host off-loading
- Components that are part of a smart sensor system
- Performance requirements
  - Area, power, footprint

## Smart Sensor Systems

Digital processing Scalable solution Sensor system partitioning

© Synopsys 2013 7

SYNOPSYS<sup>®</sup> Accelerating

## **Smart Sensor System**

Evolution: more digital processing te Compo Analog in Complexity and Performance Digital out Digital Filter Peripheral ····· Analog ir  $\mathbb{N}$ ADC Digital out Signal ····· Mixe Perinheral Digital in CPU based SoC Analog in Signal Processing Digital out Signal Processin Mixer Peripheral ww Digital in Peripheral Signal ocessin

© Synopsys 2013

8

SYNOPSYS<sup>®</sup> Accelerating Innovation

## **Sensor Systems Getting More Complex**



- Increased processing drives higher performance processor
- Sensor fusion with multiple analog & digital sensors requires higher level of integration
- Transitioning from discrete ICs to integrated sensor control within larger, more complex SoC

© Synopsys 2013

9

| 8-bit             | > 32-bit            |
|-------------------|---------------------|
| Single<br>Sensors | Multiple<br>Sensors |
| Discrete          | Integrated          |

SYNOPSYS<sup>®</sup> Accelerating

## **Smart Sensor System**

Multiple ways to get sensor data to the application processor



## **Integrated Sensor IP Subsystem**



## **Components**

Closely coupled memories

Tightly coupled IO functions

DSP extensions/Hardware accelerators

Software view

## **Typical CPU based Sensor System**

bus-based design



## **Typical CPU based Sensor System**

bus-based design using ARC EM processor



## **Typical CPU based Sensor System**

optimization: use closely-coupled memories



## **Optimized CPU based Sensor System**

ARC EM Processor with IO and DSP extensions



## IO functionality

Overview of sensor interfaces

|   | Function          | Range            | Interface                                                                  |                            |
|---|-------------------|------------------|----------------------------------------------------------------------------|----------------------------|
|   | Motion            | Accelerometers   | Analog/PWM/SPI/I <sup>2</sup> C                                            |                            |
|   | Magnetic and Hall | Gyroscopes       | Analog/SPI/I <sup>2</sup> C                                                | Compose IF                 |
|   |                   | Angle            | Analog/SPI/I <sup>2</sup> C                                                | <u>Sensor IF</u><br>ADC if |
| 8 |                   | Rotational speed | Analog <sup>[current/voltage]</sup>                                        | I2C                        |
|   |                   | Hall effect      | Analog <sup>[current]</sup> /SPC/PWM/SENT                                  | SPI                        |
|   | Proximity         | Proximity        | I <sup>2</sup> C/SPI                                                       | PWM                        |
|   | Optical           | Ambient          | Analog <sup>[current]</sup> /I <sup>2</sup> C/PWM                          |                            |
|   |                   | Distance         | Analog <sup>[voltage]</sup>                                                |                            |
|   |                   | CCD image        | Serial Digital + GPIO                                                      |                            |
|   |                   | CMOS image       | Parallel Digital + GPIO/SPI/I <sup>2</sup> C                               | Host IF                    |
|   | Pressure          | Compensated      | Analog <sup>[voltage]</sup>                                                | I2C<br>SPI                 |
|   |                   | Uncompensated    | Analog <sup>[voltage]</sup>                                                | UART                       |
|   |                   | Integrated       | Analog <sup>[voltage]</sup>                                                | (on-chip) bus              |
|   | Temperature       | Analog output    | Analog[voltage]/Switch[on/off/interrupt]                                   |                            |
|   |                   | Digital output   | I <sup>2</sup> C/SMBus/SPI/1-wire/PWM/switch <sup>[on/off]</sup> /QSPI/SST |                            |
|   |                   | Silicon          | Analog <sup>[resistance]</sup>                                             |                            |
|   |                   |                  |                                                                            | Source: Silica, Sparkfun   |

© Synopsys 2013 17

IO functionality

Tightly integrated IO



· "Tightly integrated"

- Start from existing bus based IO peripheral
- Eliminate interface to on-chip bus
- Replace *load/store* instructions to IO peripheral by *register move* instructions

© Synopsys 2013 18

SYNOPSYS<sup>®</sup> Accelerating Innovation

## **HW** support for DSP Functions

#### Accelerate application code

- Reduce memory footprint
  - HW accelerator instruction(s) replaces multiple SW statements or functions
- · Reduces cycle count
  - more performance,
  - or less power
- Some silicon area increase
- Selection of HW accelerators tailored to (sensor) application

© Synopsys 2013 19

- DSP functions
  - Math
  - Complex Math
  - Filtering
    - FIR, IIR, correlation, ...
  - Matrix/vector
  - Interpolation



## Software



## Performance

Area benefits

Cycle count and memory footprint benefits

System power benefits

**Embedded Non-Volatile Memories** 

© Synopsys 2013 21

SYNOPSYS<sup>®</sup> Accelerating Innovation



## **Significant Area & Power Benefits**

- Typical Sensor Implementation
  40- 60% Larger
- Typical Sensor Implementation
  - Smaller & significantly more power efficient
- Synopsys' Integrated Sensor Subsystem
  - Smaller and significantly more power efficient

# Functions Implemented in HW or SW for Area & Power Trade-offs

| Software Only Functions  | Hardware Functions        | FIR Filtering |
|--------------------------|---------------------------|---------------|
| Cos, Sin Sqrt            | Multiply/Accumulate (MAC) | 6x reduction  |
| Conjugate                | Accumulate/Multiply (ACM) | cycle coun    |
| DOT                      | Sine/Cosine (SIN/COS)     |               |
| Magnitude                | Square Root (SQRT)        |               |
| MaqSquared               | Absolute Value (ABS)      | Square Root   |
| Multiply                 | Add (ADD)                 | cycle coun    |
| Convolute                | Subtract (SUB)            | Cycle could   |
| Correlate                | Multiply (MULT)           |               |
| LMS, FIR, IIR            | Negate (NEGATE)           | CRC Check     |
| Add, multiply, transpose | Shift (SHIFT)             | 25x reduction |
| Linear, Bilinear         | Scale (SCALE)             | cycle coun    |

#### Fewer Cycles Equals Lower Power, Higher Performance

© Synopsys 2013 23

#### 

## HW accelerator Cyclic Redundancy Check



#### Host communication check

CRC polygon :1+x<sup>1</sup>+x<sup>2</sup>+x<sup>4</sup>+x<sup>5</sup>+x<sup>7</sup>+x<sup>8</sup>+x<sup>10</sup>+x<sup>11</sup>+x<sup>12</sup>+x<sup>16</sup>+x<sup>22</sup>+x<sup>23</sup>+x<sup>26</sup>+x<sup>32</sup>;

| Metrics @ 180nm  |       | Software Cycle count<br>Optimized | CRC accelerator<br>(hardware) |
|------------------|-------|-----------------------------------|-------------------------------|
| Footprint ROM    | Bytes | 1420                              | 12                            |
| Cycles           | #     | 76                                | 3                             |
| Accelerator      | Gates | 0                                 | 475                           |
| Memory footprint | Gates | 3140                              | 25                            |
| Total area       | Gates | 3140                              | 500                           |

- ✓ Cycle count reduced by factor 25
- ✓ Area reduces (!) by 2640 ND2 equivalent gates (85%)
- ✓ CRC Code footprint reduces from 104 instructions to 3 instructions only

## HW accelerator Square root

#### $Y = \sqrt{X}$

| Metrics @ 180nm<br>48bit input<br>24bit output |       | Software<br>Cycle count<br>optimized | Hardware<br>accelerator<br>(cycle count opt) | Hardware<br>accelerator<br>(pipelined opt) | Hardware<br>accelerator<br>(area opt) |
|------------------------------------------------|-------|--------------------------------------|----------------------------------------------|--------------------------------------------|---------------------------------------|
| Footprint ROM                                  | Bytes | 272                                  | 4                                            | 4                                          | 4                                     |
| Cycles <sup>a</sup>                            | #     | 353*n                                | n                                            | 6+n                                        | 24*n                                  |
| Accelerator                                    | Gates | 0                                    | 3740                                         | 5490                                       | 1075                                  |
| Memory footprint                               | Gates | 600                                  | 10                                           | 10                                         | 10                                    |
| Total area                                     | Gates | 600                                  | 3750 <sup>b</sup>                            | 5500                                       | 1085                                  |

- ✓ Cycle count reduced by factor 14
- Area increases only 485 ND2 equivalent gates
- ✓ SQRT Code footprint reduces from 136 instructions to 1 instructions only

© Synopsys 2013 25



SYNOPSYS<sup>®</sup> Accelerating Innovation

N-taps asymmetric FIR filter: FIR[i] = (SUM k in [0..N): C[k] \* X[i-k])

Finite Impulse Response

**HW** accelerator

| Metrics @ 180nm<br># taps =30 |       | Software Cycle count<br>Optimized | MAC accelerator<br>(hardware) |
|-------------------------------|-------|-----------------------------------|-------------------------------|
| Footprint ROM                 | Bytes | 254                               | 146                           |
| Cycles                        | #     | 2673                              | 430                           |
| Accelerator                   | Gates | 0                                 | 1520                          |
| Memory footprint              | Gates | 560                               | 300                           |
| Area                          | Gates | 560                               | 1820                          |

- ✓ Cycle count reduces by factor 6
- Area increases only 1260 ND2 equivalent gates
- ✓ FIR Code footprint reduces from 86 instructions to 58 instructions only

## **Power Benefits** Sensor fusion application

- 9-D sensor fusion
  - Combo sensor based design
  - Design optimized using:
    - tightly coupled IO functions
    - addition of fusion specific HW accelerators

| Metrics @ 90nm<br>CPU + IO + Accelerators |          | Bus-based design | Optimized design |
|-------------------------------------------|----------|------------------|------------------|
| Cycles                                    | [#]      | 100%             | 10%              |
| Dynamic Power                             | [uW/MHz] | 100%             | 107%             |
| Standard cell Area                        | Gates    | 100%             | 121%             |

## Overall power reduction of 9.3x

© Synopsys 2013

27

SYNOPSYS<sup>®</sup> Accelerating Innovation

## Conclusions

- Components
- Performance
- measurements



## Conclusions

Components for Smart Sensor System

- Identified Smart Sensor System variants
  - Intelligent sensor and Combo sensor (focus of presentation)
  - Sensor hub
- · Addressed components that are part of a Smart Sensor System
  - Optimized CPU based solution
  - IO peripheral (tightly coupled) + driver SW
  - HW accelerators for DSP functions (tightly coupled) + DSP SW
- Mix of tightly coupled IO and bus-bused IO peripherals possible (esp. when area is not high priority)

© Synopsys 2013 29

SYNOPSYS<sup>®</sup> Accelerating Innovation

## Conclusions

Performance

- · Performance improvements by
  - Tightly coupled IO peripherals
  - Tightly coupled HW accelerators
- Footprint
  - Silicon area, Software memory footprint (reducing ROM size)
- Power
  - Lower CPU core frequency
- · Predictability
  - Direct access to tightly coupled IO and HW accelerators

© Synopsys 2013 30

SYNOPSYS<sup>®</sup> Accelerating

## Conclusions

Measurements

- · All measurement and studies are based on Synopsys product portfolio
  - ARC EM core
  - Tool suite for customizing ARC EM core
    - development of tightly coupled EIA extensions/instructions
  - Complete SW development tool suite
  - EM Starter Kit development system



© Synopsys 2013 31

SYNOPSYS<sup>®</sup> Accelerating Innovation

Synopsys



