Application-Specific Multi-Processor SoC
Summer school sponsored by IEEE Circuits and Systems Society and EDAA
9 - 13 July 2001, Aix-les-Bains, France

Home     Agenda     Lectures     Registration     Location     Hotels

LECTURES

SESSION 1: Embedded System Specification

 
Specification and Validation for Heterogeneous Multiprocessor SoC
Ahmed Jerraya, TIMA Laboratory, France
The design of multiprocessor Systems-on-Chip requires the integration of heterogeneous components (DSP, MCU, memory, I/O devices, coprocessors, non digital components, …). During the design of such SoCs, we need a clear separation between component models and communication infrastructure at different abstraction levels. This lecture will discuss the specification and validation of heterogeneous multiprocessor Systems-on-Chip using virtual component models and generic wrapper architectures.
Platform based design of embedded systems-on-chip
Wolfgang Rosenstiel, University of Tuebingen  & FZI, Germany
Designing the most appropriate system architecture is key to an efficient implementation of complex algorithms such as those found in advanced applications. Over the years, it has always been recognized that rising the level of abstraction at which the creative work was done was the most effective method for improving design efficiency. Today's design methodology for embedded systems is often platform based. Finding the best algorithm/architecture trade-off is important for this design technique.
In this talk some system level design tools to support platform based design will be described. Some platform architecture examples especially for prototyping of embedded systems as well as analytical performance evaluation tools will be presented.
The goal of this work is to take into account superscalar and speculative processor architectures including caches also for the design of embedded systems.

 

SESSION 2: Multi-processor SOC

Building Systems on a Chip with Trimedia technology
Kees A. Vissers, Principal Architect, TriMedia Technologies Inc., USA

Next generation multimedia systems will combine many processors in a few ICs. The communication and computation in these systems is an
essential part of the design. In this presentation we will illustrate how technology provided by Trimedia consisting of processor cores,  dedicated coprocessors, buses, a programming environment, libraries of application software and system software, including a real-time operating system, can be combined to build complete systems. We will illustrate the flow that is used to come from system specification  to an IC and software that satisfies the real-time constraints.
 

From Applications to Multi-Processor DSP Architectures
Peter Pirsch, University of Hannover, Germany

Computational requirements for many video signal processing applications are challenging even for the most powerful available digital signal processors. Furthermore, ongoing algorithmic research leads to increasingly complex algorithms, requiring an increasing number of arithmetic  operations to be performed on a large set of data at high sample rates.
To meet the high computational requirements, current implementations of video processing systems strongly rely on the adaptation to algorithm-specific processing schemes. This lecture will show how the adaption to applications leads to different types of multi-processor DSP architectures on a chip.
Topics covered include, among others:


 

Energy-efficient design and management of SoCs
Giovanni De Micheli, Stanford University, USA

We address design and run-time issues for systems of chips with multiple interacting processor cores. We consider energy-efficient design using a layered approach, and we highlight the design trends in deep submicron technologies. We address first the design of the communication schemes among SoC components, while considering the requirements of high bandwidth and low power dissipation. We describe next run-time management of system components, to adapt performance and power levels to the workload. Then we describe stochastic models for the power/performance behavior of systems, and methods for determining optimum control policies. Last we review industrial standards for operating system-based power management, and implementation strategies for power management policies.

 

SESSION 3: RTOS for Embedded Systems

 
Real-Time Operating Systems: Principles and a Case Study
Kang G. Shin, Real-Time Computing Laboratory, The University of Michigan, USA

This talk will first cover the basic principles and taxonomy of real-time operating systems. It will then cover several commercial and research RTOSs, especially detailing EMERALDS (Extensible Microkernel for Embedded, ReAL-time, Distributed Systems) which is designed for small-memory embedded applications. These applications must run on slow (15-25MHz) processors with just 32-128 Kbytes of memory, either to keep production costs down in mass-produced systems or to keep weight and power consumption low. To be feasible for such applications, the OS must not only be small in size (less than 20 kbytes) but also have low-overhead kernel services. Unlike commercial embedded OSs which rely on carefully-optimized code to achieve efficiency, EMERALDS takes the approach of re-designing the basic OS services of task scheduling, synchronization, communication, and system call mechanism by using characteristics found in small-memory embedded systems such as small code size and a priori knowledge of task execution & communication patterns. With these new schemes, the overheads of various OS services are reduced 20-40% without compromising any OS functionality.
 

RTOS for Embedded Systems and SoC
Miodrag Potkonjak, University of California, Los Angeles, USA

 
We survey the state-of-the-art in real-time operating systems (RTOSs) from system synthesis, system-on-chip, and embedded systems points of view. RTOSs have a very long research and development history which provides important theoretical results and useful industrial implementation. Convergance of new applications (e.g. multimedia, wireless, sensor networks), technology (e.g. SoC, deep submicron, MEMS), and market trends (e.g. customization, short market window, power and cost sensitivity) implies a strong need for new generation of RTOSs. Therefore, new system synthesis problems, including hardware/software co-design and distributed system design and in particular SoC, are opening up new avenues for for RTOS research and development. We survey classical academic and industrial RTOS work, continue with case studies of Pocket PC and EPOC RTOS and conclude by outlining requirements and research directions for the SoC RTOS.
 
Real-Time Inter-Processor Synchronization Algorithms
Hiroaki Takada, Toyohashi University of Technology, Japan
 
Spin lock is the basic inter-processor synchronization primitive for shared-memory multiprocessors. For use in real-time operating systems, the normal test-and-set spin lock algorithm is not appropriate because the algorithm cannot guarantee the real-time constraints. This lecture describes the requirements on inter-processor synchronization primitives for use in real-time operating systems from the viewpoint of real-time property, responsiveness, and scalability, and presents various synchronization algorithms. Topics convered in this lecture includes queueing spin lock algorithms, priority-ordered spin locks, spin locks with preemption, real-time scalability of nested spin locks, priority inheritance spin locks, and SPEPP synchronizations.
 

SESSION 4: RTOS for Embedded Systems

From a distributed embedded RTOS to a pragmatic framework for multi-core SoC
Eric Verhulst, Eonic Solutions GmbH, Germany
 
In an introduction, the essence of real-time processing will be explained and how an RTOS also provides for a modular progamming methodology. Just like a distributed system, a multicore SoC is basically a system composed of I/O, computing and communication elements. Hence it is logical to design such a system starting from a model that is based on Communicating Sequential Processes (CSP). A pragmatic superset can be found in the Virtuoso Multicore RTOS, that using distributed semantics allows to program a multi-processor system as a virtual single processor. By the use of a common high level programming and design language like C or C++, this allows to keep the source code from simulation to back-end compilation without loosing the real-time properties. The methodology also imposes requirements on the underlying hardware if one want to fully exploit the benefits. E.g. a more natural SoC target will be a set of asynchronously connected IP blocks that communicate using a standardized communication backbone.
 


Task-level run-time scheduling approach for dynamic multi-media systems.
Francky Catthoor, IMEC, Belgium

Run-time task scheduling on a multiprocessor platform forms a real challenge for real-time embedded systems, where also costs like energy consumption are of major concern.  This problem will be illustrated first in the context of state-of-the-art embedded multi-media systems that are becoming more and more dynamic due to e.g. QoS issues. These applications also require high-performant heterogeneous multi-processor platforms to achieve real-time. Typically the run-time schedules for such dynamic systems are determined by the RTOS.  Experience shows however that this is not effective for keeping the energy or memory footprint low.  This cost-sensitive problem formulation has also not been considered in the traditional dynamic scheduling research. The approach proposed here, intends to combine the advantages of the low run-time complexity of a static scheduling phase and the flexibility of a dynamic scheduling phase. It allows to optimize the system energy consumption at run time based on precomputed cost-performance Pareto curves. The application-specific run-time scheduler is then integrated on top of the RTOS.
 

Modeling real-time systems
Joseph Sifakis, Verimag, France

 
We present a methodological framework for modeling real-time systems. We consider modeling as an activity integrated in the system development process and strongly related to design and synthesis. Based on this view, we identify some challenging problems and propose a research agenda.
A central thesis is that dynamic (timed) models of a real-time system can be obtained by adequately restricting its application software with timing information. We present a composition/decomposition methodology and experimental results illustrating the thesis. The methodology raises some very basic and challenging problems about relating functional to non functional aspects of the behavior by application of composition techniques. Its application requires building models in the large by composition of software components.
To illustrate the methodology, we present the Taxys tool for modeling synchronous embedded real-time applications.
 

SESSION 5: System-Level Architecture

The Architecture of Multiprocessor Systems on a Chip
Trevor Mudge, The University of Michigan, USA
 
This talk will start by summarizing the state of mainstream computer architecture research, highlighting the problems that computer architects think are currently important.  It will then examine more closely those areas in computer architecture that are specifically relevant for the design of multiprocessor systems on a chip. As part of this examination, two questions will be answered:
1) are there techniques or mechanisms that have already been developed in the computer architecture field that are an improvement on current approaches used in the design of multiprocessor systems on a chip; and
2) are there questions that computer architects have not addressed that they might help solve.
Finally, we will discuss how the percolation of ideas between the fields can be accelerated.
 
SOC Multiprocessor Architecture and Modeling
Rolf Ernst, TU Braunschweig, Germany

A major difference between general purpose and application specific multi-processors is the specialization and combination of architecture components to heterogeneous architectures. When going from single processor SOC to multi-processor SOCs, global control and data flow become central issues. On the software side, this trend is complemented by a combination of possibly different local scheduling algorithms which must be coordinated to control, e.g., timing and buffer sizing.
This presentation will start with an introduction to typical hardware specialization and global flow techniques and give a few examples of commercial hardware platforms. Then, an approach to global flow and timing analysis of heterogeneous multi-processors is presented. The presentation concludes with a summary of research topics in this field.

 
Architectural challenges and opportunities for systems on a chip
Bob Rau, Hewlett-Packard Laboratories, USA

The functionality of innovative smart products relies on the availability of extremely high-performance, low-cost embedded computer systems made possible by system-on-chip (SOC) levels of integration. Successful embedded computer architecture will result from carefully balancing the opportunities that SOC technology offers with market, product, and application requirements. The distinctive requirements of embedded computing will lead to significantly different computer architectures at both the system and processor levels as well as a rich diversity of off-the-shelf (OTS) and custom designs. These architectures will be considerably more special-purpose, heterogeneous, and irregular. Furthermore, the need for large numbers of custom and customizable architectures will necessitate the automation of computer architecture. SOCs are, however, a double-edged sword; the escalating cost of designing them, and of generating mask sets for them, poses a number of interesting challenges to the embedded computer architect.

 

SESSION 6: Embedded RT-SW

Challenges in Network Processor Architectures and Embedded S/W Tools
Pierre G. Paulin, STMicroelectronics, Central R&D, SoC Platform Automation, Canada
Faraydon Karim, STMicroelectronics, Central R&D, Advanced Design Group, USA
The projected explosive growth in network bandwidth and services has created the need for a new breed of processors. An emerging approach to this challenge is the development of a programmable multi-processor solution which will preserve customer investment by keeping up with ongoing changes. We review representative commercially available Network Processing Units (NPU) and describe ST's internal NPU architecture R&D efforts. We then present the software development tools developed for the ST NPU, which are based on ST's 'FlexWare' retargettable embedded software environment which supports the fine tuning of the architecture to the fast changing market requirements.
 

Adaptive EPIC Processors and Compilation Techniques
Krishna V. Palem,  Director, Center for Research on Embedded Systems and Technology, Georgia Tech, USA.

We propose classes of microprocessors that allow application programs to add and subtract functional units yielding a dynamically varying instruction set interface to the running application. During the first half of the lecture we describe this novel class of  architectures, focusing on a specific subclass called ``Adaptive Explicitly Parallel Instruction Computing'' (AEPIC) architectures whose definition represents a collection of ideas intended to enable efficient reconfiguration of processor data-paths. While AEPIC processor reconfiguration is affected by the executing program at runtime, the decisions of when and how to reconfigure are determined by the compiler and embedded in the application's executable.  In the later half, a compilation framework targeting  AEPIC processors is proposed. Several key compilation problems that need to be addressed in order to target AEPIC processors such as partitioning, instruction synthesis, configuration selection, resource allocation and scheduling are discussed. Preliminary experimental results indicate the significant role architectural features of AEPIC processors play in  masking the overheads of micro-architectural reconfiguration and improving application performance.
 
 

SESSION 7: Embedded RT-SW

Multiprocessor SoCs for Video Processing
Wayne Wolf, Princeton University, USA

Systems-on-chips will enable new video processing applications, both because they provide the computational density required to solve these problems and because we can integrate processors, memory, and even sensors.  Single-chip image processing systems will be able to perform sophisticated algorithms. This talk will consider the hardware architectures of single-chip multiprocessors for video as well as the interactions between hardware and software design for such systems.
 

Configuring the Jazz VLIW-DSP Core for Application Specific Requirements
Oz Levia, CTO, Improv Systems Inc., USA

In this presentation we describe an architectural approach to configurable and scalable Very Long Instruction Word (VLIW) DSP core for embedded systems. We focus our presentation on an actual experience with a specific configurable VLIW DSP core – the Jazz DSP processor. Following an introduction to the processor and the VLIW structure, we discuss the methodology and tools required for application specific configuration of the processor.
 

Static scheduling for embedded systems
Luciano Lavagno, University of Udine, Italy
This talk will briefly cover static and quasi-static techniques for data-dominated embedded systems. The starting point for such techniques are specification mechanisms such as Kahn process networks and dataflow networks. An internal representation (as a Petri net) is tehn used to determine the existence of a finite schedule that can forever execute that specification with bounded network. From this, an efficient implementation of a single or multiple tasks is derived, taking into account issues such as buffer minimization.

SESSION 8: System-level Architecture

Embedded System Architecture: a multi-faceted view
Philip Koopman, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA

Distributed embedded systems are becoming prevalent in applications ranging from vehicles to home automation.  Multi-processor systems on a chip can be considered both as components of such systems and as distributed embedded systems themselves.  Designing such systems to be competitive, cost-effective, and supportable over the lifecycle of both products and companies requires a multi-disciplinary approach to architecture.
An architecture is an organized collection of components that encompasses both behaviors and interfaces with respect to a specific abstraction approach.  The art in creating a good architecture is in knowing where to put interfaces and identifying the right abstraction approach.  Inevitably, more than one concurrent architectural representation is needed to represent all the important aspects of a system.  At the highest level, an embedded system architecture must provide decoupled but coordinated views of hardware, software, communications, and control.  In many applications, distinct architectures must also be provided for human interface, maintenance/upgrade, safety/security, validation/verification, component coordination frameworks, and graceful degradation.  And, of course, all these facets of the system must be compatible with overarching business and industry constraints.  Unfortunately, the need for designers to decouple architectural views and subdivide components to manage complexity can at times complicate the creation of design tools that must use global tradeoffs in their quest to optimize cost and performance.  Thus, there is often an inherent tension between optimization and architectural cleanliness when creating highly complex systems.
This 90-minute tutorial will discuss the different types of architectural system views, common design patterns used for each architectural type, and the benefits of using a multi-architecture approach to representing and understanding systems.  A brief discussion of recent research results will include a report of experiences representing a distributed embedded real-time control system in the Unified Modeling Language as well as an architectural strategy to combine product family design and self-reconfiguration to achieve graceful degradation.


 
Communication architectures for deep-submicron VLSI Systems
Jef van Meerbergen, Philips, The Netherlands

Architectures for embedded Systems on Silicon will drastically change in the near future. Deep submicron effects will make the use of buses increasingly difficult. The speed of the circuits will no longer be dominated by the gate delay but by the  interconnect delay. The amount of circuits we can integrate will be limited by power dissipation and not by area. The use of embedded software will increase rapidly.
In order to deal with these problems a design approach is proposed where we make a distinction between two levels of design: component level and system level. Components are embedded cores with different levels of programmability. They are available as reusable IP blocks which leads to reduced design times. These cores can be function specific which leads to cost effective implementations.
At the system level these cores fit into a communication infrastructure or backbone. Coarse level reconfiguration is used to provide the required flexibility at this level. A roadmap for the backbone will be presented. Circuit switching and packet switching will both be supported. A protocol stack including dynamic task graphs creation is briefly discussed. This is illustrated with a silicon design example which implements multiple video windows on a TV set.


 
System Architectures: Hardware or Software dominant ?  A Case Study: xDSL modems
Mark Genoe, Alcatel, Belgium

Today's telecommunication systems require specific intelligence embedded in complex optimized system architectures. Performance and
flexibility are the two major requirements for success, but result quite often in many architectural trade-offs with huge impact on final
profits. Excellent system knowledge combined with in-depth analysis methods is mandatory, and need to be supported with efficient methods
for system, hardware and embedded real-time software design. Moreover, these different domains need to be considered together. A case study
will illustrate the concepts.
 

SESSION 9: System-On-Chip Validation and Test

System on Chip: Embedded Test Strategies
Yervant Zorian, LogicVision, USA

Spurred by technology leading to the availability of millions of gates per chip, system-on-chip integration is evolving  as a new IC manufacturing paradigm, allowing entire systems to be built on a single chip. Test strategy for System-on-Chip remains the most critical challenge. This presentation will cover the state-of-the-art in system-on-chip test strategies, mainly concentrating on the current industrial practices in testing such chips. It will specifically discuees the requirements for designing embedded test for individual cores, hierarchical  test reuse, vectorless sign-off, test interface standardization  (IEEE P1500), embedded test management and design integration for System-on-Chip, and test resource partitioning.
 

Architecture and Implementation of Application-Specific Multi-processor SOCs for Digital TV (DTV)  and Media-Processing Applications
Santanu Dutta, Philips Semiconductors, USA

The talk will have 4 parts:
The first part of the talk will be a general introduction to SOC design and identification of the design parameters and constraints.
The second part of the talk will describe the architecture, functionality, and design of NX-2700 ---- a single-CPU digital television and media processing system on a chip that we designed at Philips Semiconductors. NX-2700 is the second generation of an architectural family of programmable multimedia processors targeted at the digital television (DTV) markets. The chip not only supports all of the eighteen ATSC formats from standard-definition to wide-angle, high-definition video, but has also the power to handle High-Definition Television (HDTV) video and audio source decoding (high-level MPEG-2, AC-3 and ProLogic audio, closed captioning, etc.) as well as the flexibility to process advanced interactive services. NX-2700 is a programmable processor with a very powerful, general-purpose Very Long Instruction Word (VLIW) Central Processing Unit (CPU) core that implements many non-trivial multimedia algorithms, coordinates all on-chip activities, and runs a small real-time operating system. The CPU core, aided by an array of multimedia co-processors, input-output units, and high-performance buses ---- an MPEG co-processor that can decode the highest-resolution interlaced compressed video bitstream, a High-Definition Video-Out (HDVO) unit that can handle the simultaneous processing of all eighteen video formats including video generation for picture-in-picture display and video generation for VCR recording, a main-memory interface that can meet the bandwidth requirements of the HDVO &MPEG blocks and the latency requirements of the video-in/out &audio-in/out units, a transport block that can accept transport streams from external sources and perform PID filtering, an audio processor that can decode AC-3 and ProLogic audio and can also connect to an external audio amplifier, and a host of other autonomous peripheral blocks with Direct Memory Access (DMA) capability --- facilitates concurrent processing of audio, video, graphics, and communication-data.
Once we have established a baseline by going through the architecture, design, CAD tools, verification, and silicon debug, the third part of the talk will focus on the lessons learnt and what we have done differently in the next, much-bigger multi-CPU (multi-processor) SOC --- the Viper chip ---- that we have designed. Viper is a highly-integrated media processor intended for deployment in Advanced Set-Top Box (ASTB) and DTV systems. We will talk about the Viper design and try to address some of the present-day problems with very large SOC designs --- tools, timing closure, physical design, ... etc.
The fourth and final part of the talk will conclude with our vision of the future of SOC designs and the various areas of research and development that we feel are important for being able to tackle even larger SOC designs.
Testing Future System-on-Chips: Challenges and Emerging Techniques
Sujit Dey, University of California, San Diego, USA

This talk will look at the architectural and technology trends of future system-on-chips (SoCs), and analyze their implications on testing such nano-chips. It will investigate the need and directions for paradigm shifts in test methodologies to enable us to realize the full potential of GHz performance, low-voltage/low-power deep sub-micron (DSM) SoCs. It will also explore new test paradigms to support the productivity improvements of new SoC design paradigms including design reuse and programmable/platform-based system-on-chips. The new test paradigms discussed will include software-based self-testing of the components and interconnects of programmable SoCs; fault modeling and self-test techniques for DSM noise effects; techniques for self-diagnosis and self-repair; and test reuse and composition for efficient testing of platform-based SoCs.