SOC

Application-Specific Multi-Processor SoC
Summer school sponsored by IEEE Circuits and Systems Society, EDAA and IST
8 - 12 July 2002, Château de Pizay, France

Home Agenda Lectures Registration Location & Accommodation

LECTURES

Application-Specific Multi-processor SoC Design Cases

Video as an Application of Multiprocessor SoCs
Wayne Wolf, Princeton University & MediaWorks Technology, USA
In this talk, we will use embedded video as a driving application for multiprocessor SoCs. Video is a demanding application that allows us to explore many areas of MPSOC design that are also of use in other applications. We will start by reviewing a smart camera system under development at Princeton University. We will then analyze the application to make some basic architectural observations. We will consider memory system design, network-on-chip design, and processing elements.

Reconfigurable Computing
Kees A. Vissers, Chameleon Systems, Inc., USA
Modern silicon technology, for instance the TSMC 0.13 process, allows the implementation of several hundreds of computing elements in a commercially realistic number of mm2. In this presentation I will cover the new opportunities that reconfigurable computing offers. These include bit-level FPGAs and word oriented architectures. Reconfigurable computing has the potential to offer scalable, flexible programming platforms that are significantly different from a cluster of conventional Risc or VLIW like processors.
The programmers view and several implementation strategies will be addressed. An overview of some commercial approaches will be discussed. In conventional terms these systems can support an Instruction Level Parallelism (ILP) of several hundred. Several approaches to address this challenge will be discussed.

Program-In, Chip-Out: The HP Labs Automatic Hardware Synthesis Project
Rob Schreiber, HP Labs.,USA
Chip designers are called on to design an ever larger variety of low-cost, low-power, embedded ASICs able to process high-bandwidth multimedia data streams. Many of these ASICs use custom hardware accelerators for the computational "hot spots." Design time, cost, energy consumption, and performance are important in these designs. As Moore's Law and the growth of the industry outstrip our ability to create and debug such designs by hand, automatic ASIC design has become a hot topic.
In order to reduce design time and design cost, the HP Labs Program-In, Chip-Out (PICO) project focuses on automatic design from high-level specifications. Source code (in a subset of C) for a performance-critical loop nest is used as a behavioral specification. The PICO system compiles the source code into a custom hardware design in the form of a parallel, special-purpose processor array. The user, or a design exploration tool, specifies the number of processing elements and their performance. The system produces the array, its local RAM, its control logic, its interface to memory, and its interface to a host processor. PICO also modifies the user's application software to make use of the generated accelerator. In experimental comparisons, PICO designs are slightly more costly than hand-designed accelerators with the same performance.
In this talk, we give an overview of PICO, and describe practical solutions to some subproblems that play a major role -- tiling, scheduling, bitwidth analysis, data path synthesis, local memory management, and estimation of the number of array elements referenced in a loop nest.

Network on Chip

Network on a chip: a new paradigm for System on Chip design
Giovanni De Micheli, Stanford University, USA
We consider systems on chips (SoCs) that will be designed and produced in five to ten years from today, with gate lengths in the range 50-100nm. We address the distinguishing features of a design methodology that aims at achieving reliable designs under the limitations of the interconnect technology. Specifically, we consider energy consumption reduction, under guaranteed quality of service (QoS), as a main objective in system design.
We show that the unreliability of the physical layer is a potential show-stopper for SoC design. We argue that network technology can be used to provide a framework for designing on-chip interconnect. We visit different layers of a micro-network stack abstraction and show new directions toward designing on-chip communication.

Designing Networks On Chip: solutions and challenges
Luca Benini, DEIS, Universita' di Bologna, Italy
Interconnection network design is becoming one of the most critical issues in current SoC development flows. Commonly adopted architectures (such as shared busses), do not scale well as complexity and parallelism increases, therefore circuit designers and chip architects are actively investigating alternative and novel approaches for eliminating communication bottlenecks. The quest for higher communication performance, reliability and energy efficiency is determining a paradigm shift from a computation-centric to a communication-centric view of SoC architectures, centered on the concept of Network-on-Chip. In this talk, we survey various network-on-chip architectures in last-generation multiprocessors on a chip for DSP, multimedia, graphics and various embedded application domains. We critically analyze the state of the art to pinpoint open challenges and opportunities, and to provide general NoC design guidelines.

Component-based Design for Multiprocessor SoC
Ahmed Jerraya, TIMA Laboratory, France
Application-specific multiprocessor system-on-chip requires the integration of heterogeneous components including processors (DSP and microcontrollers), memories, peripherals (DMA, interrupt controllers, …) and sophisticated communication network. This lecture explores a high-level component-based methodology and design environment for application-specific multiprocessor SoC architectures. The system specification is a virtual architecture annotated with configuration parameters described in a SystemC like model. The component-based design environment has automatic wrapper-generation tools able to synthesize hardware interfaces, device drivers, and operating systems that implement a high-level API. The experiment of this approach shows a drastic reduction of design time without any significant loss of efficiency in the final design.

SoC Design Methodologies

Platform Based Design of Embedded Systems in SystemC
Wolfgang Rosenstiel, University of Tuebingen & FZI, Germany
Designing the most appropriate system architecture is key to an efficient implementation of complex algorithms such as those found in advanced applications. Over the years, it has always been recognized that rising the level of abstraction at which the creative work was done was the most effective method for improving design efficiency. Today's design methodology for embedded systems is often platform based. Finding the best algorithm/architecture trade-off is important for this design technique.
SystemC is well suited to model and specify both functions as well as architectures. This talk will therefore especially explain how SystemC can be used for platform based design.

A methodology for Design Space Exploration of on-chip networks
Luciano Lavagno, Politecnico di Torino, Italy
This presentation will introduce a design methodology for architectural exploration of networks on chip based on decoupling functionality from architecture, and computation from communication. A simple example of various mapping and refinement options for a single function-to-function token-based communication will be used to practically illustrate the basic concepts.
A larger realistic on-chip network will then be used to describe what sort of information can be obtained by various mapping and performance analysis experiments. Although some specific tools will be used to make the explanation more concrete, the methodology is fully tool-independent, and can be implemented on top of several publicly available design frameworks.

System Architecture

Is VLIW the Right Architecture for Embedded Processing?
Trevor Mudge, The University of Michigan, USA
VLIW architectures excel at data parallel computations. For low-level signal processing this is an ideal match. As embedded applications become more sophisticated they start to contain decision making code that cannot be efficiently executed on VLIW machines. The fraction of this "control-style" code is likely to grow, at least in the computationally more demanding applications. The current solution to this problem is to construct a heterogeneous multiprocessor comprised of a general purpose scalar processor and a DSP. The scalar processor handles the control code, and the DSP handles the data parallel code. The questions that are suggested by this solution, given the trends mentioned above, include: 1) To what extent can these functions be combined into a convergent architecture. 2) If they can be combined, how will the resulting architecture look? Will it be a purely VLIW machine or a purely scalar machine? I will examine this question by presenting the results of running complete applications that include mixes of data parallel and control-style programs on two representative architectures: a TI TMS320C6x DSP, and a StrongARM general purpose processor. I will analyze the results from the stand points of performance, power consumption, and programmability.

A methodology and architecture for Network processor
Faraydon Karim, STMicroelectronics Inc., USA
This talk first goes through the stages of SoC development specifically the network processor. Then lays down a methodology to speed up the research and development stages. Then describes an architecture that best suits SoC development from aspects of R&D.

Automated Processor Generation for System-on-Chip
Chris Rowen, Tensilica, USA
New application-focused system-on-chip platforms motivate new application-specific processors. Extensible processor architectures offer the efficiency of tuned logic solutions with the flexibility of standard high-level programming methodology. Automated extension of processor function units and the associated software environment – compilers, debuggers, simulators and real-time operating systems – satisfies these needs. At the same time, designing at the level of software and instruction set architecture significantly shortens the design cycle and reduces verification effort and risk. This lecture describes the key dimensions of extensibility within the processor architecture, the instruction set extension description language and the means of automatically extending the software environment from that description. It also outlines the essential methods to combine these tiny, fast processors into effective system-on-chip solutions, displacing both traditional processors and traditional RTL-based logic blocks.

Embedded Software

Compiling with Time and Space Constraints
Jens Palsberg, Purdue University, USA
Most compilers ignore the problems of limited code space and real-time deadlines in embedded systems. Designers of embedded software often have no better alternative than to manually optimize the source code or even the compiled code. Besides being tedious and error-prone, such manual optimization results in obfuscated code which is difficult to maintain and reuse. In this talk we will survey the state of the art in time- and space-aware compilation. As an example, we will present a code-size-directed compiler in which register allocation and code generation are phrased as an integer linear programming problem, and the desired code-size bound is an additional constraint. Our experimental results show that our compiler, when applied to two commercial microcontrollers, can generate code that is as compact as carefully crafted code.

Managing dynamic real-time tasks for modern multi-media systems on multi-processor platforms
Rudy Lauwereins, IMEC & K.U.Leuven, Belgium
Run-time task scheduling on a multiprocessor platform forms a real challenge for real-time embedded systems, where also costs like energy consumption are of major concern. This problem will be illustrated first in the context of state-of-the-art embedded multi-media systems that are becoming more and more dynamic due to e.g. QoS issues. These applications also require high-performant heterogeneous multi-processor platforms to achieve real-time. Typically the run-time schedules for such dynamic systems are determined by the RTOS. Experience shows however that this is not effective for keeping the energy or memory footprint low. This cost-sensitive problem formulation has also not been considered in the traditional dynamic scheduling research. The approach proposed here, intends to combine the advantages of the design-time scheduling phase where all the necessary information is collected and prepared, and the flexibility of a dynamic scheduling phase that exploits this information. This approach allows to optimize the system energy consumption at run time based on precomputed cost-performance Pareto curves. The application-specific run-time scheduler is then integrated on top of the RTOS. Application results will demonstrate the feasibility of collecting these Pareto curves for realistic applications (based on prototype tools that have been developed). Also the impact of the run-time phase will be illustrated.

MP-SoC modeling: A Software point of view
Marcello Coppola, STMicroelectronics, France
Multi-Processor-System-on-Chip (MP-SOC) is a set of components, embedded on the same chip, which collaborates to achieve a common purpose. The design flow for such system is a key point for companies as STM, because a good flow may reduce the gap between productivity and time-to-silicon. The necessity to have several abstraction levels for the software and hardware is key point for such design flows. Today, hardware modeling is well covered and understood but there is a lack in modeling for software components. This talk, first of all introduces which are the software components that we can found in MP-SOC, then a comprehensive description for software components will be given. Finally a modeling methodology based on SystemC for such components is described. Two small examples conclude the talk.

Embedded Real-Time Operating Systems

Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems
Kang G. Shin, Real-Time Computing Laboratory, The University of Michigan, USA
In recent years, there has been a rapid and wide spread of non-traditional computing platforms, especially mobile and portable computing devices. As applications become increasingly sophisticated and processing power increases, the most serious limitation on these devices is the available battery life. Dynamic Voltage Scaling (DVS) has been a key technique in exploiting the hardware characteristics of processors to reduce energy dissipation by lowering the supply voltage and operating frequency. The DVS algorithms are shown to be able to make dramatic energy savings while providing the necessary peak computation power in general-purpose systems. However, for a large class of applications in embedded real-time systems like cellular phones and camcorders, the variable operating frequency interferes with their deadline guarantee mechanisms, and DVS in this context, despite its growing importance, is largely overlooked/under-developed. To provide real-time guarantees, DVS must consider deadlines and periodicity of real-time tasks, requiring integration with the real-time scheduler. In this talk, we present a class of novel algorithms called the real-time DVS (RT-DVS) that modify the OS's real-time scheduler and task management service to provide significant energy savings while maintaining real-time deadline guarantees. We show through simulations and a working prototype implementation that these RT-DVS algorithms closely approach the theoretical lower bound on energy consumption, and can easily reduce energy consumption 20% to 40% in an embedded real-time system.This is joint work with Padmanabhan Pillai.

Communication as the backbone for a well balanced system design
Eric Verhulst, Eonic Solutions GmbH, Germany
In a multi-processing environment a communication primitive is the logical equivalent of a local assignment. This means that in such an environment the communication functions are as important as the processing functions as found on the processors themselves to achieve a well balanced system design. The underlying programming model is offered by CSP (Communicating Sequential Processes) of C.A.R Hoare. This model also allows an early modeling of the final application that can be mapped onto adequately design hardware. A step further is achieved by using reprogrammable logic to build an active communication backbone. Such a model is found in the CSPA (Communicating Signal Processing Architecture) developed by Eonic for board level DSP systems. It allows to further maximize the CPU cycles available for processing while putting regular dataflow type processing in the data-streams. The result is a system architecture that is well balanced, more scalable and more efficient as it eliminates many of the bottlenecks that bus based architectures exhibit.

Scheduling of Multi-process System Specifications
Jan Madsen, Informatics and Mathematical Modelling, Technical University of Denmark, Denmark
A multi-process system specification exhibits significant non-determinism, as only a partial ordering is specified. The behavior in terms of produced output, will be the same regardless of internal behavior or implementation details. However, the actual ordering and choice of implementation may influence a number of other parameters such as performance, power consumption, flexibility, reliability, production cost, etc. These parameters have to be captured and explored in order to produce a successful SoC design.
In this lecture we will focus on the scheduling problem. We will introduce scheduling terminology and basic concepts, based on classical real-time scheduling for uni-processor systems. These are then extended to handle shared resources, data dependencies, context switching, cache effects and power reduction, which are all relevant for SOC. Finally, we will extend it to the problem of multi-processor scheduling.

Embedded Software

Compiler Optimizations for Design Space Exploration
Krishna Palem, Proceler Inc., USA
The design process for embedded systems is different from that of a general-purpose architecture as there is little need to adhere to standards. Embedded computers are used in specific settings and are often highly tuned for an application. Although the benefits in customizations are large, the design task is highly meticulous and time consuming. In recent years, several well-founded methodologies have emerged for the design space exploration of computing cores. However, to a large extent, the design of a supporting memory hierarchy has been ad-hoc, often relying on intuition and extensive simulations. In this talk, we propose a rigorous foundation geared to facilitate the exploration of the memory hierarchy design space and aid in the selection of a pareto-optimal design with respect to performance and cost. In addition, we shall demonstrate how a novel and lightweight compiler-based data remapping technique can significantly lower memory needs, and hence serve as a key step in optimizing the memory hierarchy design during design-space exploration.

Trends and Requirements for Network Processor SoC tools
Pierre Paulin, SoC Platform Automation, STMicroelectronics, Central R&D, Canada
This talk first presents the spec of an 'ideal' NPU embedded S/W and SoC tool environment, based on our interactions with NPU users in telecom system houses. We then present a survey of current SoC tool environments for leading Network Processor Unit (NPU) vendors. We highlight the technical R&D challenges related to multi-processor S/W development and analysis.
We present 'StepNP', an experimental NPU R&D platform, which serves as the basis for NPU SoC tool and methodology exploration. We conclude with a description of our R&D in NPU SoC tools.

Datapath width optimization for customizable Processors
Hiroto Yasuura, Graduate School of Information science and Electrical Engineering, Kyushu University, Japan
The datapath width strongly affects system performance, chip area, and energy consumption of an SoC. For a given set of applications, we can optimize datapath width of each components on the SoC. In this talk, the datapath width optimization problem is defined and several optimization techniques and tools for the optimization are introduced. The bitwidth analysis gives information on required datapath width. Soft-core processor is a customizable processor whose datapath width is parameterized. Valen-C and its compiler provide programming environment for Soft-core processor. A dynamic control approach for the datapath width is also presented. Several optimization methods for area minimization and energy minimization are persented with experimental results. These optimization methods significantly reduces area and power consumption of processors and memories in SoC, preserving system performance. We also propose a new design paradigm, called quality-driven design, which trades the computational quality (e.g., quality of video output) with performance, cost and energy/power.

System Architecture

SOC Multiprocessor Architecture and Modeling
Rolf Ernst, TU Braunschweig, Germany
A major difference between general purpose and application specific multi-processors is the specialization and combination of architecture components to heterogeneous architectures. When going from single processor SOC to multi-processor SOCs, global control and data flow become central issues. On the software side, this trend is complemented by a combination of possibly different local scheduling algorithms which must be coordinated to control, e.g., timing and buffer sizing.
In this presentation, we will look into some of the hard problems of target architecture modeling. Starting with an overview of architecture component models, we will then use a bottom-up approach composing HW/SW subsystem models to complete MPSoC models. This approach naturally supports system analysis in component-based designs. We will show different examples and discuss the impact on timing and performance analysis.

Embedded System Architecture: a multi-faceted view
Philip Koopman, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Distributed embedded systems are becoming prevalent in applications ranging from vehicles to home automation. Multi-processor systems on a chip can be considered both as components of such systems and as distributed embedded systems themselves. Designing such systems to be competitive, cost-effective, and supportable over the lifecycle of both products and companies requires a multi-disciplinary approach to architecture.
An architecture is an organized collection of components that encompasses both behaviors and interfaces with respect to a specific abstraction approach. The art in creating a good architecture is in knowing where to put interfaces and identifying the right abstraction approach. Inevitably, more than one concurrent architectural representation is needed to represent all the important aspects of a system. At the highest level, an embedded system architecture must provide decoupled but coordinated views of hardware, software, communications, and control. In many applications, distinct architectures must also be provided for human interface, maintenance/upgrade, safety/security, validation/verification, component coordination frameworks, and graceful degradation. And, of course, all these facets of the system must be compatible with overarching business and industry constraints. Unfortunately, the need for designers to decouple architectural views and subdivide components to manage complexity can at times complicate the creation of design tools that must use global tradeoffs in their quest to optimize cost and performance. Thus, there is often an inherent tension between optimization and architectural cleanliness when creating highly complex systems.
This talk will discuss the different types of architectural system views, common design patterns used for each architectural type, and the benefits of using a multi-architecture approach to representing and understanding systems. A brief discussion of recent research results will include a report of experiences representing a distributed embedded real-time control system in the Unified Modeling Language as well as progress in creating a unified approach to achieving graceful degradation for distributed embedded systems.

Multiprocessor Architectures for Signal Processing Applications
Peter Pirsch, University of Hannover, Germany
Signal processing applications with their real-time constraints have caused a still growing demand for processing power and data throughput in today's DSP architectures. Implementations ranging from digital video/audio equipment to portable devices such as cellular phones and personal digital assistants have to deal with both, supporting a variety of increasingly complex algorithms and meeting low power contraints within compact realizations. Multiprocessor architectures address these requirements by mapping processing schemes onto a set of application-specific processor cores on a single chip.
This lecture will cover following topics, including several examples:

Analysis of signal processing applications

Adaptation of processor architectures to algorithm-specific characteristics

System-On-Chip implementation

Architecture trends for future applications.

SoC Design & Test Methodologies

System on Chip: Embedded Test Strategies
Yervant Zorian, Virage Logic, USA
Spurred by technology leading to the availability of millions of gates per chip, system-on-chip integration is evolving as a new IC manufacturing paradigm, allowing entire systems to be built on a single chip. Test strategy for System-on-Chip remains the most critical challenge. This presentation will cover the state-of-the-art in system-on-chip test strategies, mainly concentrating on the current industrial practices in testing such chips. It will specifically discuees the requirements for designing embedded test for individual cores, hierarchical test reuse, vectorless sign-off, test interface standardization (IEEE P1500), embedded test management and design integration for System-on-Chip, and test resource partitioning.