# Inter/Intra-Chip Optical Networks: Opportunities and Challenges



THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY

PUTING

#### **Subsystems of MPSoC**

- Computation: process data, implement major functions
- Control: coordinate all the subsystems
- Memory: temporally store data, instructions, and system status
- Communication: transfer data, instructions, and other information inside, into, and out of an MPSoC
- Support: maintain appropriate operating conditions, such as power supply, clock, temperature, etc.
- Can physically overlap
  - E.g. computation and control
- There are grey areas
  - E.g. communication interfaces



## Communication Subsystem: Road System of MPSoC

#### Performance

- Cooperation among functional units and chips
- Communication latency
- Power consumption
  - Significant dynamic power used by communications
  - Considerable leakage of interconnect drivers and buffers
- Cost and yield
  - Metal layers for interconnects
  - Device layer for drivers and buffers
  - Chip I/O





### Performance and Power Wall of Electrical Interconnects

#### • More communications

- Hundreds of cores on a single chip
- Cisco QuantumFlow (40), Intel Phi (61), Tilera Tile (72), Cisco SPP (188), PicoChip (300) ...

#### • Higher power consumption

- Dynamic and leakage power of drivers and buffers
- Kilowatts of power by 2020\*
- Larger latency
  - Multiple clock cycles are required to cross a chip
- Tighter chip I/O bandwidth
  - High pin count, packaging cost, and expensive PCB design

| Year                 | 2015 | 2020 | 2025 |
|----------------------|------|------|------|
| MP process (nm)      | 25   | 13   | 7    |
| Clock (GHz)          | 4.4  | 5.3  | 6.5  |
| Transistor (billion) | 3.1  | 12   | 49   |



\*R.G. Beausoleil, et al., "Nanoelectronic and Nanophotonic Interconnect," Proceedings of the IEEE, Feb. 2008.

\*\* Based on ITRS2012 update

### **Optical Interconnects**

- Photonic technologies have been successfully used in WAN, LAN, and board level
  - Showed strengths in multicomputer systems and Internet core routers
- Base on silicon waveguide and microresonator (MR)
  - Silicon based and CMOS compatible
  - Demonstrated by IBM, Intel, NEC, HP, Fujitsu, Oracle, NTT, STMicro ...
  - Commercialized by startups, Luxtera, Lightwire/Cisco, Kotura/Mellanox ...
  - MR is as small as 3µm in diameter
  - 30ps switching time has been demonstrated



#### Silicon-based Waveguide and MR

R. G. Beausoleil *et al.*, "A Nanophotonic Interconnect for High-Performance Many-Core Computation", IEEE LEOS June 2008



#### Integrated EO and OE Interfaces

G. Masini, *et al.*, "A 1550nm 10Gbps monolithic optical receiver in 130nm CMOS with integrated Ge waveguide photodetector", IEEE International Conference on Group IV Photonics, 2007



#### On-Chip Optical Routers

R. Ji, J. Xu, L Yang, "Five-Port Optical Router Based on Microring Switches for Photonic Networks-on-Chip", IEEE Photonics Technology Letters, March, 2013

### Fundamentally Different "Building Material"

#### Advantages

- Ultra-high bandwidth
- Low propagation delay
- Low propagation loss
- Low sensitivity to environmental EMI

#### Disadvantages

- Difficult to "buffer" optical signals
- Electrical/optical conversion overheads
- Thermal sensitivity
- Crosstalk noise
- Specialties
- Such differences bring new opportunities and challenges



Solkan Bridge, Slovenia (stone, 1906)



Cold Spring Bridge, USA (steel, 1963)



Tsing Ma Bridge, Hong Kong (steel, 1997)

#### Outline

- Introduction
- Unified Inter/Intra-Chip Optical Network
- Thermal Modeling
- Optical Crosstalk Noise Analysis
- Communication Traffic Patterns
- Summary

### Intra-Chip Communication Architecture (CMA)

- Ad hoc interconnects
  - Dedicated point-to-point interconnections
  - Intuitive, but not cost-effective for complex systems

Bus

- Shared media communication architectures
- Mature, but limited throughput and high power consumption
- Network-on-Chip (NoC)
  - Based on switching and routing techniques
  - High-throughput, scalable, and energy-efficient
  - But complex to design
- Many intra-chip CMAs originate from interchip or multicomputer CMAs
  - Such as bus and NoC



# **Codesign Inter- and Intra-Chip Communication Architectures**

- Inter- and intra-chip CMAs based on electrical interconnects are often separately designed
  - Different on-chip and on-board constraints
  - Limited chip I/Os create a sharp chip boundary
  - Maximize design flexibility and allow third-party system integration
- Jointly design inter- and intra-chip CMAs could potentially take the full advantage of optical interconnects
  - Ultra-high bandwidth
  - Reduce buffering and large E/O conversion overheads

# UNION: Unified Inter/Intra-Chip Optical Network



- Hierarchical optical network for multiple chips
- Optical network-on-chip (ONoC) is the intra-chip network
- Inter-chip optical network (ICON) collaborates with ONoC to handle inter-chip traffic
- Payload and control packets share the same optical network

\* X. Wu, Y. Ye, W. Zhang, W. Liu, M. Nikdast, X. Wang, J. Xu, "UNION: A Unified Inter/Intra-Chip Optical Network for Chip Multiprocessors", NanoArch, June 2010

### Fat Tree based ONoC

- A k-core chip uses an llevel tree
  - Electronic concentrator connects four cores
  - Routers are grouped into router clusters
  - $l = \log_2(k / 4)$  network levels
  - $(k / 8) \log_2(k / 4)$  optical routers
  - k/4 crossbar-based concentrators
- Use turnaround routing algorithm



- Centralized control decision but distributed execution
- Dynamic optical power control
  - Adjust EO interface power based on different optical paths

## Basic Optical Switching Element (BOSE)



- Two types of 1x2 BOSEs
  - Crossing element and parallel element
  - Both composed of an MR and two optical waveguides
- Crossing element has a waveguide crossing, and hence additional crossing insertion loss
- Optical components are still in the range of microns
  - Minimizing the number of MRs is necessary

# **Optical Turnaround Router** (OTAR)

UP left

DOWN left

DOWN right

- 4x4 switching function
- Strictly non-blocking for turnaround routing algorithm
- Use  $\lambda_0$  for payload packets
  - $\lambda_0$  is the on-state resonance wavelength
- Better than optical crossbars
  - Six MRs, four waveguides, no terminators
  - Minimized waveguide crossing insertion loss
  - Ports are aligned to intend directions
- Passively route packets
  - Between UP-left and DOWN-left as well as UP-right and DOWN-right
  - Save power and reduce MR insertion loss in 40% cases



Down Left

Down

Right

#### **Router Clusters**

- Each cluster is controlled by a single control unit
- Control signals are from the network controller
- Use  $\lambda_1$  for control packets
  - The network is designed to passively transfer control packets
- One extra O-to-E interface and MR with off-state resonance wavelength  $\lambda_1$



#### **Network Controller**



- Connected to the top level router cluster
- Deterministic turnaround routing algorithm
  - Complexity is  $O(n \log n)$
- Cost is 0.11mm<sup>2</sup> and 31µW/MHz for 64-core 1.25GHz multicore processor in 45nm

### Floorplan of Fat Tree Based ONoC

- Cluster the routers
- The optimized floorplan minimizes the number of waveguide crossings
  - Waveguide crossing is the major cause of optical power loss
  - 87% reduction compared to the H-tree floorplan



\* Z. Wang, J. Xu, et al., "Floorplan Optimization of Fat-Tree Based Networks-on-Chip for Chip Multiprocessors", IEEE Transactions on Computers 2012

### **Inter-Chip Optical Network**

- Multichannel data bus
  - One channel per top-level router
  - Each half-duplex bidirectional channel uses one waveguide
  - Each channel can be dynamically divided to carry multiple point-to-point communications simultaneously
  - Interface switches are controlled by network controllers
- Control bus uses a single waveguide





#### Protocols

- Connection-oriented circuit switching for payload packets
- Deterministic turnaround routing algorithm for ONoC
  - Minimal path routing
  - Dead lock and live lock free
- For inter-chip traffic, optical paths in source multicore, destination multicore and ICON are reserved simultaneously
- Three control packets are used to maintain optical paths
  - REQUEST, GRANT, TEARDOWN

### **Simulation and Comparison**

- Systems with one to eight 64-core multicore processors
  - 40Gb/s bandwidth for  $\lambda_0$  and  $\lambda_1$
  - Electronic units work at 1.25GHz in 45nm
- Electrical counterpart
  - Fat tree based NoC and inter-chip bus
  - 40Gb/s per interconnection with extra control lines
- Under eight real applications
  - 200~3000 tasks per application
  - Mapped and scheduled offline for maximum performance
- SystemC-based cycle-accurate architecture-level simulation environment

#### **Network Performance**



Lower packet delay and better scalability

#### **Energy Efficiency**



Higher energy efficiency and better scalability

#### Performance



 On average more than 3X improvement compared to the electrical counterpart

#### Outline

- Introduction
- Unified Inter/Intra-Chip Optical Network
- Thermal Modeling
- Optical Crosstalk Noise Analysis
- Communication Traffic Patterns
- Summary

### **Thermal Modeling**

- Chip temperatures vary a lot over both time and space
- Thermal effects can cause
  - Laser power efficiency degradation
  - Temperature-dependent wavelength shifting
  - Optical power loss caused by wavelength mismatch
- System-level thermal model needs to consider
  - VCSEL temperature-dependent wavelength shifting and power efficiency
  - Microresonator temperature-dependent wavelength shifting and optical power loss
  - Waveguide propagation loss variation
  - Photodetector sensitivity and dark current
  - Chip temperature distribution

<sup>\*</sup> Y. Ye, J. Xu, X. Wu, W. Zhang, X. Wang, M. Nikdast, Z. Wang, W. Liu, "Modeling and Analysis of Thermal Effects in Optical Networks-on-Chip", ISVLSI 2011

### Thermal Model of Optical Interconnects

- Necessary condition for an optical link to function properly
  - Optical power reaching the receiver must be larger than the receiver sensitivity

Optical link

VCSEL

Switching stage 1

MR
Switching stage 2

Optical transmitter

Optical path

Optical receiver

 $P_{TX} - L_{SW} - L_{WG} \ge S_{RX}$ 

$$P_{TX} = (I - \alpha - \beta (T_{VCSEL} - T_{th})^2) (\varepsilon - \gamma \cdot T_{VCSEL})$$

$$L_{SW} = \sum_{i=1}^{N} 10 \log((\frac{2\kappa^2 + \kappa_p^2}{2\kappa^2})^2 \cdot (1 + (\lambda_{VCSEL\_min} + \rho_{VCSEL}(T_{VCSEL} - T_{min}) - \rho_{MR}(T_{MR_i} - T_{min}) - \lambda_{MR\_min})^2 / \delta^2))$$

$$\begin{aligned} 10log((I - \alpha - \beta(T_{VCSEL} - T_{th})^2)(\varepsilon - \gamma \cdot T_{VCSEL})) - \sum_{i=1}^N 10log((\frac{2\kappa^2 + \kappa_p^2}{2\kappa^2})^2 \cdot (1 + \delta^{-2}(\lambda_{VCSEL\_min} + \rho_{VCSEL}(T_{VCSEL} - T_{min}) - \rho_{MR}(T_{MR_i} - T_{min}) - \lambda_{MR\_min})^2)) - L_{WG} \geq S_{RX} \end{aligned}$$

**Electronic domain 2** 

#### **Optimal Initial Device Settings**

#### Total power consumption

 $P_{total} = P_{VCSEL} + P_{driver} + P_{MRs} + P_{SerDes} + P_{receiver}$ 

- Optimal initial device settings exist
- They can be decided by the following formula

#### **Worst-Case Power Consumption**

- Thermal tuning can compensate wavelength deviation from resonance
- Default setting:  $\lambda_{MR_0} = \lambda_{VCSEL_0} = 1550$ nm at room temperature
  - 5 pJ/bit when temperature reaches 85°C
- Optimal settings can improve the power efficiency by 29% to 3.7 pJ/bit



Worst-case power consumption, 3-dB bandwidth is 3.1nm, three switching stages

### **Optical Crosstalk Noise**

- Crosstalk noise is an intrinsic characteristic of optical components
  - Very small at device level: 0.01% ~ 0.1% of signals at device level
  - Was ignored at router and network levels
- We modeled them in the power domain



\* Y. Xie, M. Nikdast, J. Xu, et al., "Crosstalk Noise and Bit Error Rate Analysis for Optical Network-on-Chip", DAC 2010

#### **Worst-Case Signal-to-Noise Ratio**

#### For arbitrary MxN mesh network using arbitrary optical routers

$$SNR_{a,\min,M,N} = \frac{L_{a,ln,E}L_{a,W,S}^{N-3}L_{a,W,S}L_{a,W,S}L_{a,N,S}L_{a,N,Ej} + L_{a,W,S}L_{a,N,S}^{M-3}L_{a,N,Ej}L_{a,W,S}L_{a,N,S}L_{a,N,S}L_{a,N,S}L_{a,N,S}L_{a,N,Ej}}{N_{a,X,2}} + L_{a,N,Ej}L_{a,W,S}L_{a,N,S}L_{a,N,Ej} \left(\frac{1 - L_{a,W,E}^{M-3}}{1 - L_{a,W,S}}N_{a,Y,2} + L_{a,N,Ej}\left(\frac{1 - L_{a,N,S}^{M-3}}{1 - L_{a,N,S}}N_{a,Y,3}\right) + N_{a,Y,M}\right)} + N_{a,Y,M}$$

$$N_{a,X,j} = \begin{cases} L_{a,ln,S}P_{in}\frac{L_{a,N,S}}{L_{c_1}}L_{c_2}L_g(k_1 + L_{c}^2k_2) + L_{a,ln,W}L_{a,E,Ej}P_{in}L_{c_2}L_gk_1 & j = 1 \\ L_{a,ln,W}\frac{L_{a,W,Z}}{L_{c}^2L_g}}P_{in}k_1 + L_{a,ln,W}L_{a,E,Ej}P_{in}L_{c_2}L_g(k_1 + L_{c}^2k_2) + L_{a,ln,N}L_{a,S,N}\frac{L_{a,W,Z}}{L_{c}^2L_g^2}}P_{in}k_1 + L_{a,ln,S}\frac{L_{a,N,S}}{L_{c_1}L_g^2}P_{in}L_{c_2}L_g(k_1 + L_{c}^2k_2) + L_{a,ln,W}L_{a,E,N}\frac{L_{a,W,Z}}{L_{c}^2L_g^2}}P_{in}k_1 + L_{a,ln,S}\frac{L_{a,N,S}}{L_{c_1}L_g^2}}P_{in}L_{c_2}(k_1 + L_{c}^2k_2) + L_{a,ln,W}L_{a,E,N}\frac{L_{a,W,Z}}{L_{c}^2L_g^2}}P_{in}k_1 + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c_1}L_g^2}}P_{in}L_{c_2}(k_1 + L_{c}^2k_2) + L_{a,ln,W}L_{a,E,N}\frac{L_{a,W,Z}}{L_{c}^2L_g^2}}P_{in}k_1 + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}L_{c_2}(k_1 + L_{c}^2k_2) + L_{a,ln,W}L_{a,E,N}\frac{L_{a,W,Z}}{L_{c}^2L_g^2}}P_{in}k_1 + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}L_{c_2}(k_1 + L_{c}^2k_2) + L_{a,ln,W}L_{a,E,N}\frac{L_{a,W,Z}}{L_{c}^2L_g^2}}P_{in}k_1 + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}L_{c_2}(k_1 + L_{c}^2k_2) + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}k_1 + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}L_{c_2}(k_1 + L_{c}^2k_2) + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}k_{c_2}(k_1 + L_{c}^2k_2) + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}k_{c_2}(k_1 + L_{c}^2k_2) + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}(k_2 + L_{e,ln,W}^2}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}(k_2 + L_{e,lN}^2) + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}P_{in}(k_2 + L_{e,lN}^2) + L_{a,ln,W}\frac{L_{a,E,N}}{L_{c}^2L_g^2}}P_{in}(k_2 + L_{e,lN}^2) + L_{a,LN}\frac{L_{a,E,N}}{L_{c}^2L_g^2}P_{in}(k_2 + L_{e,LN}^2) + L_{a,LN}\frac{L_{a,E,N}}{L_{c}^2L_g^2}P_{in}(k_2 + L_{e,LN}^2) + L_{a,LN}\frac{L_{a,E,N}}{L_{c}^2L_g^2}P_{$$

#### **Worst-Case SNR**

- Crosstalk significantly lowers SNR of mesh network
- Optical router design strongly affects SNR





### **Communication Traffic Patterns**

- Communication traffic patterns are essential for Interconnection networks
  - Architecture exploration
  - Performance evaluation
  - Comparisons
- Random/synthetic traffic patterns
  - Pro: simple to implement, pinpoint specific aspects of networks
  - Con: cannot show overall application performance and power, and application-specific issues

• Realistic traffic patterns are based on real applications

- Pro: show overall application performance and power, and application-specific issues
- Con: difficult to get

<sup>\*</sup> W. Liu, J. Xu, X. Wu, Y. Ye, X. Wang, W. Zhang, M. Nikdast, Z. Wang, "A NoC Traffic Suite Based on Real Applications", ISVLSI 2011

#### **MCSL Benchmark Suite**

#### Based on real applications

Consider computation, communication, and memory

| Application      | Description                                                                                    | Tasks  | Communication<br>Links |
|------------------|------------------------------------------------------------------------------------------------|--------|------------------------|
| FFT-1024_complex | Fast Fourier transform with 1024 inputs of complex numbers                                     | 16,384 | 25,600                 |
| H264-1080p_dec   | H.264 video decoder with a resolution of 1080p                                                 | 5,191  | 7,781                  |
| H264-720p_dec    | H.264 video decoder with a resolution of 720p                                                  | 2,311  | 3,461                  |
| FPPPP            | SPEC95 Fpppp is a chemical program performing multi-electron integral derivatives              | 334    | 1145                   |
| RS-32_28_8_dec   | Reed-Solomon code decoder with codeword format RS(32,28,8)                                     | 278    | 390                    |
| RS-32_28_8_enc   | Reed-Solomon code encoder with codeword format RS(32,28,8)                                     | 248    | 328                    |
| SPARSE           | Random sparse matrix solver for electronic circuit simulations                                 | 96     | 67                     |
| ROBOT            | Newton-Euler dynamic control calculation for the 6-degrees-of-<br>freedom Stanford manipulator | 88     | 131                    |

#### More applications will be released soon

Collaborating with application experts

#### **Generation Methodology**

- Based on HW/SW codesign flow
  - MCSL stands for Multi-Constraint System-Level
- Optimized to maximize overall application performance
  - Memory space allocation
  - Application mapping and scheduling
- Support different topologies and network sizes
  - Capture communication behaviors as well as their temporal and spatial dependencies
  - Traffic patterns can be applied to NoC with the same topology and size but different settings
  - Currently cover mesh, torus, and fat tree
- Two types of traffic patterns
  - Recorded traffic pattern (RTP) for detailed analysis
  - Statistical traffic pattern (STP) for average, worst, and best case studies



#### **Communications Among Tasks**



#### Communications Among PBs on 8x8 Mesh-based NoC



7/23/2013

Jiang Xu (HKUST)

#### **Comparing to Random Traffics**



- Significantly different network performance results reported by random traffic patterns
  - E.g. uniform traffic patterns report
     34X smaller packet delay on average



#### Summary

- Optical interconnect is a promising technology
- Many architectures and techniques have been explored, but much more could be done
- There are new challenges

#### Acknowledgement

#### Current group members

- Yaoyao Ye, Xiaowen Wu, Zhehui Wang, Mahdi Nikdast, Duong Huu Kinh Luan, Xuan Wang, Zhe Wang
- Past members
  - Weichen Liu, Kwai Hung Mo, Yu Wang, Yiyuan Xie, Huaxi Gu, Xing Wen, Qi Li



人子 大學教育資助委員會 University Grants Committee

int<sub>e</sub>l.





- R. Ji, J. Xu, L. Yang, "Five-Port Optical Router Based on Microring Switches for Photonic Networks-on-Chip", IEEE Photonics Technology Letters, March, 2013.
- Y. Ye, J. Xu, *et al.*, "System-Level Modeling and Analysis of Thermal Effects in Optical Networks-on-Chip", IEEE Transactions on Very Large Scale Integration Systems, February 2013.
- Y. Ye, J. Xu, *et al.*, "3D Mesh-based Optical Network-on-Chip for Multiprocessor System-on-Chip", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, April 2013.
- X. Wu, Y. Ye, J. Xu, *et al.*, "UNION: A Unified Inter/Intra-Chip Optical Network for Chip Multiprocessors", Accepted by IEEE Transactions on Very Large Scale Integration Systems.
- Z. Wang, J. Xu, et al., "Floorplan Optimization of Fat-Tree Based Networks-on-Chip for Chip Multiprocessors", Accepted by IEEE Transactions on Computers.
- Y. Ye, J. Xu, *et al.*, "System-level Analysis of Mesh-based Hybrid Optical-Electronic Network-on-Chip," IEEE International Symposium on Circuits and Systems, May 2013.
- Y. Ye, J. Xu, *et al.*, "A Torus-based Hierarchical Optical-Electronic Network-on-Chip for Multiprocessor System-on-Chip", ACM Journal on Emerging Technologies in Computing Systems, February 2012.
- Y. Xie, J. Xu, et al., "Crosstalk Noise Analysis and Optimization in 5×5 Hitless Silicon Based Optical Router for Optical Networks-on-Chip (ONoC)," IEEE/OSA Journal of Lightwave Technology, January, 2012.
- Y. Xie, M. Nikdast, J. Xu, et al., "A Formal Worst-Case Analysis of Crosstalk Noise in Mesh-Based Optical Networks-on-Chip", IEEE Transactions on Very Large Scale Integration Systems, 2012.
- Z. Wang, J. Xu, et al., "A Novel Low-Waveguide-Crossing Floorplan for Fat Tree Based Optical Networks-on-Chip", IEEE Optical Interconnects, 2012.
- Y. Ye, J. Xu, et al., "Thermal Analysis for 3D Optical Network-on-Chip Based on a Novel Low-Cost 6x6 Optical Router", IEEE Optical Interconnects Conference, 2012.
- K. Feng, Y. Ye, J. Xu, "A Formal Study on Topology and Floorplan Characteristics of Mesh and Torus-based Optical Networks-on-Chip", Microprocessors and Microsystems, June 2012.
- Y. Ye, X. Wu, J. Xu, *et al.*, "Holistic Comparison of Optical Routers for Chip Multiprocessors", IEEE International Conference on Anti-Counterfeiting, Security and Identification, 2012.
- Y. Ye, J. Xu, *et al.*, "Modeling and Analysis of Thermal Effects in Optical Networks-on-Chip," IEEE Computer Society Annual Symposium on VLSI, 2011.
- Y. Xie, J. Xu, et al., "Elimination of cross-talk in silicon-on-insulator waveguide crossings with optimized angle", Optical Engineering, June 2011.
- W. Liu, J. Xu, et al., "A NoC Traffic Suite Based on Real Applications," ISVLSI, 2011.
- Y. Xie, M. Nikdast, J. Xu, et al., "Crosstalk Noise and Bit Error Rate Analysis for Optical Network-on-Chip", DAC, 2010.
- X. Wu, Y. Ye, W. Zhang, W. Liu, M. Nikdast, X. Wang, J. Xu, "UNION: A Unified Inter/Intra-Chip Optical Network for Chip Multiprocessors", NanoArch, 2010.
- K.H. Mo, Y. Ye, X. Wu, W. Zhang, Weichen Liu, J. Xu, "A Hierarchical Hybrid Optical-Electronic Network-on-Chip", ISVLSI, 2010.
- H. Gu, S. Wang, Y. Yang, J. Xu, "Design of Butterfly-Fat-Tree Optical Network-on-Chip", Optical Engineering, vol 49, issue 9, 2010.
- H. Gu, J. Xu, W. Zhang, "A Low-power Fat Tree-based Optical Network-on-Chip for Multiprocessor System-on-Chip", DATE, 2009.
- Y. Ye, L. Duan, J. Xu, et al., "3D Optical NoC for MPSoC", IEEE International 3D System Integration Conference, 2009.
- H. Gu, K. H. Mo, J. Xu, et al., "A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip", ISVLSI, 2009.
- H. Gu, J. Xu, et al., "ODOR: a Microresonator-based High-performance Low-cost Router for Optical Networks-on-Chip", CODES, 2008.
- H. Gu, J. Xu, et al., "A Novel Optical Mesh Network-on-Chip for Gigascale Systems-on-Chip", IEEE Asia Pacific Conference on Circuits and Systems, 2008.
- H. Gu, J. Xu, *et al.*, "Design of Sparse Mesh for Optical Network on Chip", IEEE Asia Pacific Optical Communications, 2008.



# Thanks!