

# **Communication Platforms for System-on-Chip and Network-on-Chip**

Prof. Hannu Tenhunen & Dr. Li-Rong Zheng Royal Institute of Technology IT-Universitet, Electronic System Design 16440 Kista, Sweden hannu@ele.kth.se

## Outline

- Trends at-large in electrónic integration
- Interconnect strategies
- Optimisation of global Interconnects for SoC/NoC

# Algorithm on a Chip

Work in 1980s on VLSI, DSP-ASIC, silicon compilation, layout genrators, design libraries Transistor/gate centric

# System on a Chip

Work in 1990s. Synthesis centric research, Core processors, busses, reusability Low power. Interconenct centric

Network on a Chip

Communicaton centric

Hardwired Computation

Hardwired Communication

Programmable Computation

Hardwired Communication

Programmable Computation

Programmable Communication

Next step: Autonomic chips: mitigates variability in technology

## Platform based designs

- A single-chip Platform is defined by
  - (A) the on-chip communication infrastructure from the physical level to the application level;
  - (B) the provided on-chip services
     (e.g. load balancing, power management, fault-tolerance, resource allocation, task scheduling, external I/O, etc.);
  - (C) the VLSI design methodology mapping and implementing applications onto the platform.



Fixed interconnection infrastructure

- \* Time-share the resources
- \* Bus based platform is not scalable

#### Interconnect Length Distribution and Wiring Hierarchy



#### State-of-the-Art ASIC Design Flow



- Traditional ASIC design with a strict partitioning of logical and physical design phases.
- State-of-the-art ASIC design flow that includes noise analysis in post-layout verification.

#### Concept of Interconnect-Centric Design Flow



- Start from geometry and global wires; Current design flow is largely kept;
- Recursively applied to a large design (Network System Module Gates)

#### Interconnect strategies

### Interconnect Delay



#### Interconnect Requirements in a Network-Based Systems



Demanding HDI substrates with a large amount of wiring resource, low thermal mismatch, low cost, easy for processing, and environmentally friend

KTH/ESDlab/HT

7/22/2005

How far can a signal travel before it has to be refreshed?





(a) no refresh; (b) refreshed;
(a) same total signal delay,
(a) 1.3mm (b) 3 mm.

#### KTH/ESDlab/HT

7/22/2005

### In the future, even a single chip can not be synchronized!



In the future, a chip will have to be synchronized by different clock domains due to interconnect limitation

#### What is the largest size for each synchronous resource ?



At reduced performance, larger resource size

#### Noise: A Key Stopper in Mixed Signal Systems



### Wire Plan: Plan for Noise Immunity

#### Interconnectivity of Submicron VLSI Interconnects under Noise Margin Constraint



Saturated crosstalk as a function of interconnect capacity per layer for ICs using deep submicron technology. (a) CD= $0.05\mu$ m,  $t=0.21\mu$ m,  $h=0.21\mu$ m; (b) CD= $0.10\mu$ m,  $t=0.32\mu$ m,  $h=0.31\mu$ m; (c) CD= $0.18\mu$ m,  $t=0.5\mu$ m,  $h=0.42\mu$ m; (d) CD= $0.25\mu$ m,  $t=0.65\mu$ m,  $h=0.58\mu$ m; (e) CD= $1.0\mu$ m,  $t=1.0\mu$ m,  $h=1.1\mu$ m; (f) CD= $2\mu$ m,  $t=2\mu$ m,  $h=2\mu$ m. CD: Minimum metal critical dimension. For the shielding wire case, w=1CD, and pitch width is 2(d+w).

#### Wire Plan: Plan for Noise Immunity and Performance

Interconnectivity of High Speed Global Interconnects under Noise Margin Constraint



Saturated crosstalk as a function interconnect capacity for 1cm-long global wires for deep submicron VLSI/ULSI circuits. The wires are designed to be as LC lines, so that very high data rates with good signal integrity can be achieved. Shielding (A):  $t=1\mu$ m,  $H=4\mu$ m; Shielding (B):  $t=0.5\mu$ m,  $H=8\mu$ m; Shielding (C):  $t=2\mu$ m,  $H=12\mu$ m.

## **Optimize Global Interconnections for SoC/NoC**

- Optimal Signaling Over Parallel Wires
  - Delay is not the only concern associated with the interconnection.
  - Another major issue is the bandwidth supported by the interconnect under certain constraints -- such as limited area, limited power consumption and limited freedom in choosing repeater insertion strategy.
  - Study how delay and bandwidth are related and derive an optimal bandwidth under different constraints

#### **Optimize Global Interconnections for SoC/NoC**

#### • Delay Reduction With Optimized Repeater Insertion

- The configuration of victim line and aggressor lines



- Six distinct switching patterns can be identified.
  - 1) Both aggressors switch from 1 to 0.
  - 2) One switches from 1 to 0, the other is quiet
  - 3) Both are quiet
  - 4) One switches from 1 to 0, the other switches from 0 to 1
  - 5) One switches from 0 to 1, the other is quiet
  - 6) Both switch from 0 to 1

1) And 2) slow-down effect; 3) and 4) take no effects; 5) and 6) speed-up effect

#### **Optimize Global Interconnections for SoC/NoC**

#### • Delay Reduction With Optimized Repeater Insertion

- Based on the previous presented delay model

$$t_{distr} = 0.4 R C$$

- The delay equation of the victim line is given as below

$$t_{vic} = 0.4RC_s + \lambda_i RC_c$$

 $\checkmark$  The coefficients  $\lambda_i$  is used to model the crosstalk effects caused by different switching patterns of the aggressors.

| i | Switching pattern | $\lambda_i$ | μ    |
|---|-------------------|-------------|------|
| 1 | (a)               | 1.51        | 2.20 |
| 2 | (b)               | 1.13        | 1.50 |
| 3 | (c)               | 0.57        | 0.65 |
| 4 | (d)               | 0.57        | 0.65 |
| 5 | (e)               | na          | na   |
| 6 | (f)               | 0           | 0    |

Table 1. Coefficients of The Heuristic DelayModel For Distributed Line With DifferentSwitching Patterns.

The coefficient  $\mu_i$  in Table 1 is an empirical constant to model the Miller effect.

#### **Delay reduction with optimized repater insertion**

• To reduce delay, the long lines are broken up into shorter sections, with a repeater driving each section as shown in the following figure

- Let the number of repeaters including the original driver be K, and the size of each repeater be H times a minimum sized repeater.
- Then the delay equation for the K sections segmented line is given

$$t_{uneq} = \sum_{i=1}^{K} [A_i + B_i] + \frac{t_r}{2} \quad A_i = 0.7(R_{drv,m} / h_i + R_{via})(C_s l_i + H_i C_{drv,m} + \mu_i \times 2C_c l_i)$$
$$B_i = r l_i (0.4 C_s l_i + \lambda_i C_c l_i + 0.7 H_i C_{drv,m})$$

Rdrv,m is the output impedance of a minimum sized repeater Cdrv,m is the output capacitance of a minimum sized repeater tr is the rise time of signal • When the repeaters are equalized over the line, the delay expression of whole line can be reduced to the equation as below

$$t_{eq} = K[A_i + B_i'] + t_r / 2 \qquad A_i' = 0.7 \frac{R_{drv,m}}{H} (C_s / K + HC_{drv,m} + \mu_i \times 2C_c / K)$$
$$B_i' = \frac{R}{K} (0.4C_s / K + \lambda_i C_c / K + 0.7 HC_{drv,m})$$

• In order to find the optimum H and K for minimizing delay, the partial derivations of above equation with respect to K and H are equated to zero. The result equations are listed as below

$$K_{i,opt} = \sqrt{\frac{0.4RC_s + \lambda_i RC_c}{0.7R_{drv,m}C_{drv,m}}} \qquad H_{i,opt} = \sqrt{\frac{0.7R_{drv,m}C_s + 1.4\mu_i R_{drv,m}C_c}{0.7RC_{drv,m}}}$$

### Optimise global interconnections

◆ Optimal Signaling Over Parallel Wires



- The worst-case delay two always correspond to the switching pattern 1) -- Both aggressors switch from 1 to 0.
- Any calculation of bandwidth has to consider the worst-case delay as the minimum delay over a wire.
- By allowing a sufficient margin of safety, the wire delay is

Where, twc can be calculated by equation

• The bandwidth BW in terms of bits per second is given as below

Where, the N is the number of signal wires that can be fitted into a given area. T is the line delay derived in previous slide.

- The number of signal wires N that can be fitted into a given area depends on whether shielding is carried out or not.
  - If shielding is not carried out, the width of N parallel wires WT is
  - If shielding is carried out, the width of N parallel wires WT is

S is the space between wires

W is the width of a conductor (wire)

Shielding wires are added between two signal wires.

- Now the problem definition can be stated as follows: for a constant width WT, what are the N (number of wires), S (spacing between wires) and W (width of a wire) values that give the optimum bandwidth?
- The optimal arrangement depends very much on the resources allocated for repeaters, and is investigated by simulations first.
- Then approximate analytic equations are developed that give close to optimal solutions.
- Some simulations work about finding the optimal N,S

- ◆ The carried out simulations have following assumptions:
  - » The minimum feature size is 50nm
  - » Technology dependent constant  $\beta$  about copper wire is 1.65
  - » The wire height above substrate h is  $0.2\mu m$
  - » Wire thickness t is  $0.21 \mu m$
  - » The minimum wire width and spacing are each assumed to be  $0.1 \mu m$
  - » The output impedance of a minimum sized inverter is assumed to be  $7 \text{K} \Omega$
  - » In all cases the constraint of the width of all wires WT is set to  $15\mu m$
- Of the three variables N,S and W, only two are linearly independent. We choose to vary N and S, and assumes that W and S are variable in multiples of the minimum pitch.

- Optimal Signaling Over Parallel Wires
  - Simulation 1: Ideally Driven Line
    - » Although ideal sources are never present in practice, it serves as a point of comparison for later simulation results.

» The following figure shows how the bandwidth varies with N and S



It can be seen that there is a clear optimum point when N is 16, S is 0.4  $\mu$ m

- Simulation 2: Unshielded Lines with Optimal Buffering
  - The repeaters used in this simulation are optimally sized in terms of wire delay.
  - The optimized values of H and K are 52 and 7 respectively

Unshielded Lines with Optimal Buffering x 10<sup>11</sup> (bits/sec) Bandwith ( 0.6 x 10<sup>-6</sup> 0 0 10 20 0.2 30 50 Spacing, S (m) 70 80

The maximum bandwidth is 345.5Gbits/s, which is obtained when W=S=0.1µm.

This means that the optimal configuration equate to the maximum number of wires. Number of Conductors, N

#### Simulation 3: Unshielded Lines with Constant Buffering

- Optimal repeater insertion results in a large number of huge buffers
- The size can be reduced with little increase in delay
- Instead of optimal repeater insertion, a constraint is imposed on the number and size of repeaters for each line



K=1 and H=20 is the constant used for buffer

The maximum bandwidth is 171.1Gbits/s when N=42, S=0.2µm.

Number of Conductors, N

- Simulation 4: Unshielded Lines with Constrained Buffering
  - Typically, the constraint would on the total area occupied by th buffers.
  - Hence, K and H would be affect by N.
  - The constrain about the total buffer area can be expressed by
  - When Amax = 500, K=1 is the optimal configuration.

The figure on the left is a plot of the bandwidth where K=1 and H changes according to N.

The optimal configuration turns out to be W=0.26 $\mu$ m, S=0.4 $\mu$ m and N=23.



Number of Conductors, N

#### Simulation 5: Shielded Lines with Optimal Buffering

- Generally, shielding each signa wire results in a drop in the bandwidth.
- Because the number of signal l is reduced greatly although the delay is reduced by the shieldir at the same time.
   The figure on the left is a plot of

The figure on the left is a plot of the bandwidth where every other wire is a minimum sized shielding wire.

The maximum bandwidth is 261.3Gbits/s which is less than the unshielded case.

Shielding can be considered as an option to reduce area and power consumption for repeaters.

