

## On-Chip Interconnects: Circuits and Signaling from an MPSoC Perspective

Wayne Burleson Associate Professor, Dept ECE University of Massachusetts Amherst <u>burleson@ecs.umass.edu</u>

# My Perspective

- VLSI Signal Processing, BS/MS MIT 1983, PhD Colorado 1989
- Worked as a VLSI Designer (Fairchild, VTI) and teach VLSI Design
- Research in VLSI Circuits
  - Low-power (NSF, SRC)
  - Interconnects (SRC, Intel)
  - Wave-pipelining (NSF)
  - SRAM (Intel, CRL)
  - Soft-errors (Intel, MMDC)
- Research in VLSI Architecture
  - Adaptive SOC (NSF)
  - VLSI Signal Processing (NSF)
  - Video, 3D Graphics (NSF)
  - Embedded Security (HSARPA)



Application







## Objectives

- Provide insight into on-chip interconnects from a circuit designer's perspective
- Survey recent research in on-chip interconnects
- Present MPSoC interconnect requirements and metrics
- Show how to compare circuit and signaling solutions
- Discuss the impact of **uncertainties** (process variation, noise, temperature)
- Show CAD support for interconnect design and estimation
- Some examples

## **On-Chip Interconnect:** Levels of Abstraction

- Network level
  - CDMA
  - TDMA
- System level
  - Communication Links
  - Adaptive supply voltage links
- Architecture level
  - AMBA<sup>™</sup>
  - CoreConnect<sup>™</sup>
- Circuit level
  - Low Swing
  - Coding
  - Single / Differential



#### UMassAmherst MPSoC Interconnect Requirements

- Intra-core vs. inter-core,
  Bus-width (1,8,...32,64,...)
  Adjacent core vs. long-haul
  Repeated vs. unrepeated
  Single-cycle vs. pipelined
  Bus vs. point-to-point
  Synch vs. asynch
  Metrics:
  - •Latency
  - Bandwidth
  - Noise
  - Area
  - Power/Energy



Source : A. Jerraya, W. Wolf, **Multiprocessor Systems-on-Chips**, Elsevier 2005

#### UMassAmherst MPSoC NoC Interconnect issues

Granularity of cores, ie wirelength
Bus width
Network topology
Link layer (fault-tolerance, etc.)
Synchronous vs. Asynchronous



Source : Ahmed Jerraya, Wayne Wolf, **Multiprocessor Systems-on-Chips**, Elsevier 2005

#### UMassAmherst MPSoC "Bus" Alternatives

- Fixed Bus [Bergamaschi, DAC, 2000]
  - Point to point communication
  - Signals between cores transferred by dedicated wires
- FPGA-like Bus [Cherepacha, FPGA Sym, 1994]
  - Programmable interconnects
  - Employ static network
- Arbitrated Bus [IDT Inc., 2000]
  - Time-shared multiple core connectivity
  - Use arbitrator
- Hierarchical Bus [AMBA, ARM Inc]
  - Combine multiple buses using bus bridges
  - Separate buses for cores and I/O
- NoC Bus [Dally, DAC, 2000]
  - Resources communicate with data packets
  - Use switch fabric



Source : Jian Liang, **Development and Verification of System-on-a-Chip Communication Architecture**, *Ph.D. thesis, Department of Electrical and Computer Engineering*, University of Massachusetts, Amherst, May 2004

#### UMassAmherst SoC Bus Standards: CoreConnect<sup>™</sup> and AMBA<sup>™</sup>

|                      | IBM CoreConnect<br>Processor Local Bus                                                                                               | ARM AMBA 2.0<br>AMBA High-performance Bus                                                     |  |
|----------------------|--------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|--|
| Bus Architecture     | 32-, 64-, and 128-bits<br>Extendable to 256-bits                                                                                     | 32-, 64-, and 128-bits                                                                        |  |
| Data Buses           | Separate Read and Write                                                                                                              | Separate Read and Write                                                                       |  |
| Key Capabilities     | Multiple Bus Masters<br>4 Deep Read Pipelining<br>2 Deep Write Pipelining<br>Split Transactions<br>Burst Transfers<br>Line Transfers | Multiple Bus Masters<br>Pipelining<br>Split Transactions<br>Burst Transfers<br>Line Transfers |  |
|                      | On-Chip Peripheral Bus                                                                                                               | AMBA Advanced Peripheral Bus                                                                  |  |
| Masters<br>Supported | Supports Multiple Masters                                                                                                            | Single Master: The APB Bridge                                                                 |  |
| Bridge Function      | Master on PLB or OPB                                                                                                                 | APB Master Only                                                                               |  |
| Data Buses           | Data Buses Separate Read and Write Separate or 3-                                                                                    |                                                                                               |  |

#### UMassAmherst Interconnect Geometry Scaling

| Layer                | Pitch | Thick | AspectRatio |
|----------------------|-------|-------|-------------|
|                      | (nm)  | (nm)  |             |
| Isolation            | 220   | 320   | -           |
| Polysilicon          | 220   | 90    | -           |
| Contacted gate pitch | 220   | -     | -           |
| Metal 1              | 210   | 170   | 1.6         |
| Metal 2              | 210   | 190   | 1.8         |
| Metal 3              | 220   | 200   | 1.8         |
| Metal 4              | 280   | 250   | 1.8         |
| Metal 5              | 330   | 300   | 1.8         |
| Metal 6              | 480   | 430   | 1.8         |
| Metal 7              | 720   | 650   | 1.8         |
| Metal 8              | 1080  | 975   | 1.8         |

| LAYER     | PITCH | THICK | AR  |
|-----------|-------|-------|-----|
| Isolation | 240   | 400   | -   |
| Poly-Si   | 260   | 140   | -   |
| Metal 1   | 220   | 150   | 1.4 |
| Metal 2,3 | 320   | 256   | 1.6 |
| Metal 4   | 400   | 320   | 1.6 |
| Metal 5   | 480   | 384   | 1.6 |
| Metal 6   | 720   | 576   | 1.6 |
| Metal 7   | 1080  | 972   | 1.8 |

65nm Intel<sup>©</sup> Technology

90nm Intel<sup>©</sup> Technology

- Weak scaling of vertical dimension compared to horizontal dimension
- Extremely high height/width aspect ratios
- Reduces degradation of interconnect resistance

### Interconnect RC Trend



RC/µm increases 40-60% per generation
 Copper, low-K dielectric: modest benefit 10

#### UMassAmherst Interconnect Distribution Trend



RC/μm scaling trend is only one side of the story... Average wire lengths don't scale well

## **Interconnect Power Consumption**

- Using Vdd programmability
- High Vdd to devices on critical path
- Low Vdd to devices on non-critical paths
- Vdd Off for inactive paths
- A Baseline Fabric
- B Fabric with Vdd Configurable Interconnect



This work builds on a similar idea for FPGAs described in:

Fei Li, Yan Lin and Lei He. Vdd Programmability to Reduce FPGA Interconnect Power, IEEE/ACM International Conference on Computer-Aided Design, Nov. 2004

## **Interconnect Modeling**



a.Capacitive model b.RC model c.RLC model



Massoud, Ckts and Devices Mag, 2001



#### Interconnect Issues – Signal Integrity



Ismail, TVLSI, 2002

- Inductance becoming important
- Self inductance results in ringing
- Mutual inductance results in crosstalk

# **Circuit and Signaling Solutions**

- Conventional Circuit techniques
  - Repeater insertion
  - Booster insertion
- Low Swing techniques
  - Pseudo differential interconnect
  - Differential Current sensing
- Bus encoding techniques
  - Transition aware encoding
  - Low Power encoding for crosstalk reduction
- Signaling techniques
  - Multi-level signaling
  - Near speed of light signaling

Network

System

Circuit

Architecture



- Optimum repeater insertion reduces interconnect delay.
- Optimized energydelay tradeoffs used to satisfy design criteria.



#### UMassAmherst Interconnect Circuits - Repeaters



#### UMassAmherst Interconnect Circuits – Boosters



## Interconnect Circuits - Low swing

- Reduce the swing on the interconnect
- Use a PMOS/NMOS device to provide a resistive path
- Reduces dynamic power since interconnect is not charged full rail.
- Noise immunity low at the output due to reduced swing
- Static power dissipation due to low impedance path



- Current mode, voltage mode
- Single ended, differential

#### UMassAmherst Interconnect Circuits – Low Swing

- Voltage mode
- Single ended
- One wire per bit
- Receiver not sensitive to supply variation



#### **Pseudo Differential Interconnect**

| Schemes | Energy<br>(PJ)  |          |       | Delay<br>(ns)   |          | E•D   | Swing   | Complexity |                     |
|---------|-----------------|----------|-------|-----------------|----------|-------|---------|------------|---------------------|
| Drive   | Driver/<br>Wire | Receiver | Total | Driver/<br>Wire | Receiver | Total | (PJ•ns) | (V)        | Complexity          |
| CMOS    | 11.45           | 0.15     | 11.6  | 1.64            | 0.47     | 2.11  | 24.5    | 2.0        | least area overhead |
| PDIFF   | 1.32            | 0.60     | 1.92  | 1.65            | 0.75     | 2.40  | 4.6     | 0.5        | timing, 1 REF       |

20

#### UMassAmherst Interconnect Circuits – Low Swing



- Current mode, Differential
- Avoids charging and discharging wire capacitance
- No repeaters along the wire: Avoids placement constraints
- Suffers from static power dissipation (paths shown by dashed lines)



## **Delay-Power tradeoffs**





## Delay vs. wirelength



Intel 90nm (wires with 2x min. width)

## % chip coverage in n cycles



Percentage of Chip Coverage

#### Hybrid Repeaters & Current-sensing



#### UMassAmherst Eliminating bus static power dissipation



- Send current only when there is a transition
- Hold the bus at GND otherwise
- Encoder and decoder overhead

## Interconnect Solutions - Bus Encoding

- Reduce dynamic power due to switching activity on a bus
  - Transition encoding, spatial encoding, invert encoding, pattern encoding
- Various encoding target different aspect of interconnect
  - Delay, power, energy, crosstalk, area
- Cost of encoding/decoding
  - Power, area, latency, additional wires

#### Interconnect Solutions – Bus encoding



- Uses a dynamic bus configuration
- Encoder translates input transition activity into an output logic state
- Decoder uses encoded signal to reconstruct the original input using its stored state information to distinguish between the two input transitions.



## **Interconnect Solutions - Bus Encoding**

- Bus invert encoding
  - Checks each cycle if there is a possibility of greater than 50% transitions on the bus
  - Decides whether sending the true or compliment form of the signals
  - Reduces the switching activity
  - Requires one additional wire to inform receiver whether the bus is true or complement
  - Numerous extensions and improvements for different statistical assumptions and metrics

Stan/Burleson, TLVSI, 1997 29

# UMassAmherst Multi-level Current Signaling

Multi-level Signaling

Current

Driver

b1Tx

b2Tx

• Encode two or more data bits and transmit on interconnect.

Interconnect

- The two or more data bits are encoded into four or more current levels. Current provides more head-room than voltage!
- Sense the current levels and decode the original signals





## Phase Coding



- Actually phase modulation
- Transmitting multiple bits in one transition
  - Significant power and area savings
  - Increased bandwidth
- Phase coding Phase determines the data
- How to deal with timing uncertainty?

#### UMassAmherst Open Loop Phase Coding



- Delay elements can be shared across wires
- Supply noise, Process variation etc. can result in errors

## Measured Results: Closed Loop

- 16-bit 5mm long bus, 0.27u wide, 0.27u spacing, shielded, 1GHz
- Repeater insertion, Transition encoding used
- Encode in  $\frac{1}{2}$  cycle and use  $\frac{1}{2}$  cycle for decode

| Encoding<br>Levels<br>(bits/wire<br>) | Encoder<br>Overhead<br>(mW) | Decoder<br>Overhead<br>(mW) | Phase coding<br>power<br>(mW) | Repeater<br>bus<br>(mW) |
|---------------------------------------|-----------------------------|-----------------------------|-------------------------------|-------------------------|
| 2                                     | 0.33                        | 1.00                        | 1.00 5.61                     |                         |
| 3                                     | 0.47                        | 1.33                        | 5.01                          | 8.56                    |
| 4                                     | 0.62                        | 1.52                        | 4.28                          | <b>8.56</b> 33          |



## Near Speed of light Signaling



 TABLE II

 Performance of Different Approaches to On-Chip Signaling

| Signaling                       | Propagation<br>Medium                  | Time of<br>Flight for<br>20mm [ps] | Delay<br>for 20<br>mm [ps] | Power<br>[mW] | Comments                            |
|---------------------------------|----------------------------------------|------------------------------------|----------------------------|---------------|-------------------------------------|
| Modulation<br>(This work)       | Wide metal<br>wircs $(c_r = 4)$        | 133                                | 300                        | 16            | Large metal<br>area                 |
| Repeaters                       | Min. sized metal<br>wires $(e_r = 4)$  | 133                                | 1400                       | 30            | Slow                                |
|                                 | Wide metal wires $(e_r = 4)$           | 133                                | 400                        | 50            | Large metal<br>area, High<br>power  |
| Optics (Edge-<br>Emitting) [11] | Air ( $e_r = 4$ )                      | 66                                 | 300                        | 80            | Packaging,<br>Integration<br>issues |
|                                 | On-Chip<br>Waveguide<br>$(e_r = 11.7)$ | 228                                | 500                        | 80            | Integration<br>issues               |
| Optics<br>(VCSEL) [11]          | Air $(e_r = 1)$                        | 66                                 | 400                        | 60            | Packaging,<br>Integration<br>issues |
|                                 | On-Chip<br>Waveguide<br>$(e_r = 11.7)$ | 228                                | 600                        | 60            | Integration<br>issues               |

- 283ps for 20mm 16um wide AL wire in 0.18um CMOS tech
- Very wide, R~0
- Uses frequency modulation

#### **Uncertainty - Process Variations**



Nassif, ISQED, 2000

#### UMassAmherst Impact of Uncertainty on Delay and Power



- 100nm technology
- 1000 Monte Carlo Runs
- Power variability of 43.64%
- Delay variability of 28.95%
- 100nm Technology
- Bin 1(High Performance) Yield 36.1%
- Bin 2(Low Delay) Yield 27.3%
- Bin 3(Low Power) Yield 25.1%
- Bin 4(Low Performance) Yield 11.5%





#### UMassAmherst Uncertainty - Temperature Variations



- $_{\rm v}$  Leakage power significantly increases with temp. in 45 nm node.
  - $\diamond$  parabolic curvature (y = Cx<sup>2</sup>) in terms of varying temperatures.
- v Higher temperature sensitivity (45 nm) on delay, power and leakage.

♦ Accurate RLC modelization to provide underestimation.

#### Uncertainty due to Power Supply Noise





#### Uncertainty – Variation-tolerant Design



- Razor methodology
  - A voltage-scaling methodology based on real-time detection and correction of circuit timing errors
  - Allows for energy tuning of microprocessor pipeline
  - Application or Razor methodology results in up to 64% energy savings with less than 3% delay penalty for error recovery





#### UMassAmherst Interconnect test chips





## CAD Support

• GTX (SRC-MARCO)

– A GSRC Technology Extrapolation System

- NoCIC (UMASS)
  - SPICE-based Interconnect Calculator for aggressive Circuit techniques (currentsensing, multi-bit sensing, boosters, etc.)

## Repeater Optimization using GTX

- Most commonly cited optimal buffer sizing expression (Bakoglu)
- In GTX:
  - Sweep repeater size for single stage in the chain
  - Examine both delay and energy-delay product



#### UMassAmherst Inductance analysis using GTX

- Five different models implemented in GTX
  - Bakoglu's model (RC\_B)
  - [Alpert, Devgan and Kashyap, ISPD 2000] (RC\_ADK)
  - [Ismail, Friedman and Neves, TCAD 19(1), 2000] (RLC\_IFN)
  - [Kahng and Muddu, TCAD 1997] (RLC\_KM)
  - Extension of [Alpert, Devgan and Kashyap, ISPD 2000] (RLC\_ADK)



#### A snapshot of NoCIC

\_ 0 🛛

v

New Page 1 - Microsoft Internet Explorer provided by Comcast

File Edit 🤲 🕥 - 🕥 - 💽 🙎 🔥 👳 🙆 Address 🛃 C: Oocuments and Settings Wishak. Venkatraman/Desktop/Siz04\_presentation/EDVAL\_NOC.htm

#### NoCIC : Network-on-Chip Interconnect Calculator



#### UMassAmherst Case Study : MPSoC with NOC



Venkatraman, SLIP, 2003

\* J. Liu et.al System level interconnect design for network-on-chip interconnect IPs, *in proceedings of the international workshop on System level interconnect prediction, SLIP 2003.* 

## Case-Study: On-Chip Security

(Burleson, Tessier, Gong, Wolf, Gogniat, 2005)

- On-chip monitoring and security bus
- Latency-critical for fast detection and mitigation of attacks
- Improved power, performance and security over software-based defenses



CM = Configurable Monitor OCIN = On-Chip Intelligence Network



## **Conclusions & Challenges**

- Interconnects are a critical enabling abstraction in MPSoC
- Interconnects play a very large and increasing role in delay, energy, and design effort.
- Interconnect can be solved simultaneously at the micro-architectural, circuit and process levels
- Aggressive circuit and signaling techniques show promise with minimal architectural impact
- CAD support needed, especially
  - early estimation for architecture and floorplanning
  - final verification in the presence of uncertainties

#### VLSI Interconnects: A Design Perspective,

W. Burleson and A. Maheshwari Morgan-Kaufmann. 2005(6)

- 400-page textbook with HW problems, covering:
  - History (both off-chip and on-chip)
  - Process (metallization, dielectrics, etc.)
  - Architecture (processor, ASIC, FPGA, memory)
  - Theoretical models (graph, information-theoretic)
  - Wire models (R,C,L,M,...)
  - Circuits (repeaters, boosters, sense-amps, etc.)
  - CAD (estimation, synthesis, optimization)
  - Case Studies (buses, memories, ASIC, FPGA)
  - Future (nano, optical, wireless, etc.)

#### UMASS Interconnect Circuit Design Group

- Students:
  - Vishak Venkatraman (internship at CRL, PhD 06)
  - Jinwook Jang (MS 05, PhD 08)
  - Sheng Xu (new Sept 04)
  - Ibis Benito (new Jan 05)
  - Atul Maheshwari (now at Intel)
  - Matt Heath (now at Intel)
  - Aiyappan Natarajan (now at AMD)
  - Vijay Shankar (now at Qualcomm)
  - Anki Nalamalpu (now at Intel)
- Collaborators:
  - Sandip Kundu (UMASS/Intel/IBM)
  - Russ Tessier, Israel Koren, Aura Ganz (UMASS)
  - Shubu Mukherjee, Rich Watson, (Intel MMDC)
  - SRC liasons (Intel CRL, Freescale)

- Alums
  - Manoj Sinha (now at Micron)
  - Chris Cowell (now at Intel)
  - Sriram Srinivasan (now at AMD)
  - Andrew Laffely (now Prof at USAFA)
  - Srividya Srinivasaragavan (now at Intel) <sup>50</sup>