LECTURES
Specification and Validation for Heterogeneous Multiprocessor SoC The design of multiprocessor Systems-on-Chip requires the integration of heterogeneous components (DSP, MCU, memory, I/O devices, coprocessors, non digital components, …). During the design of such SoCs, we need a clear separation between component models and communication infrastructure at different abstraction levels. This lecture will discuss the specification and validation of heterogeneous multiprocessor Systems-on-Chip using virtual component models and generic wrapper architectures. |
Wolfgang Rosenstiel, University of Tuebingen & FZI, Germany
In this talk some system level design tools to support platform based design will be described. Some platform architecture examples especially for prototyping of embedded systems as well as analytical performance evaluation tools will be presented. The goal of this work is to take into account superscalar and speculative processor architectures including caches also for the design of embedded systems. |
Next generation multimedia systems will combine many processors in a
few ICs. The communication and computation in these systems is an
essential part of the design. In this presentation we will illustrate
how technology provided by Trimedia consisting of processor cores,
dedicated coprocessors, buses, a programming environment, libraries of
application software and system software, including a real-time operating
system, can be combined to build complete systems. We will illustrate the
flow that is used to come from system specification to an IC and
software that satisfies the real-time constraints.
From Applications to Multi-Processor DSP Architectures
Peter
Pirsch, University of Hannover, Germany
Computational requirements for many video signal processing applications
are challenging even for the most powerful available digital signal processors.
Furthermore, ongoing algorithmic research leads to increasingly complex
algorithms, requiring an increasing number of arithmetic operations
to be performed on a large set of data at high sample rates.
To meet the high computational requirements, current implementations
of video processing systems strongly rely on the adaptation to algorithm-specific
processing schemes. This lecture will show how the adaption to applications
leads to different types of multi-processor DSP architectures on a chip.
Topics covered include, among others:
Energy-efficient design and management of SoCs
Giovanni De Micheli,
Stanford University, USA
We address design and run-time issues for systems of chips with multiple interacting processor cores. We consider energy-efficient design using a layered approach, and we highlight the design trends in deep submicron technologies. We address first the design of the communication schemes among SoC components, while considering the requirements of high bandwidth and low power dissipation. We describe next run-time management of system components, to adapt performance and power levels to the workload. Then we describe stochastic models for the power/performance behavior of systems, and methods for determining optimum control policies. Last we review industrial standards for operating system-based power management, and implementation strategies for power management policies.
This talk will first cover the basic principles and taxonomy of real-time
operating systems. It will then cover several commercial and research RTOSs,
especially detailing EMERALDS (Extensible Microkernel for Embedded, ReAL-time,
Distributed Systems) which is designed for small-memory embedded applications.
These applications must run on slow (15-25MHz) processors with just 32-128
Kbytes of memory, either to keep production costs down in mass-produced
systems or to keep weight and power consumption low. To be feasible for
such applications, the OS must not only be small in size (less than 20
kbytes) but also have low-overhead kernel services. Unlike commercial embedded
OSs which rely on carefully-optimized code to achieve efficiency, EMERALDS
takes the approach of re-designing the basic OS services of task scheduling,
synchronization, communication, and system call mechanism by using characteristics
found in small-memory embedded systems such as small code size and a priori
knowledge of task execution & communication patterns. With these new
schemes, the overheads of various OS services are reduced 20-40% without
compromising any OS functionality.
RTOS for Embedded Systems and SoC
Miodrag Potkonjak, University of
California, Los Angeles, USA
Task-level run-time scheduling approach for dynamic multi-media
systems.
Francky Catthoor, IMEC, Belgium
Run-time task scheduling on a multiprocessor platform forms a real challenge
for real-time embedded systems, where also costs like energy consumption
are of major concern. This problem will be illustrated first in the
context of state-of-the-art embedded multi-media systems that are becoming
more and more dynamic due to e.g. QoS issues. These applications also require
high-performant heterogeneous multi-processor platforms to achieve real-time.
Typically the run-time schedules for such dynamic systems are determined
by the RTOS. Experience shows however that this is not effective
for keeping the energy or memory footprint low. This cost-sensitive
problem formulation has also not been considered in the traditional dynamic
scheduling research. The approach proposed here, intends to combine the
advantages of the low run-time complexity of a static scheduling phase
and the flexibility of a dynamic scheduling phase. It allows to optimize
the system energy consumption at run time based on precomputed cost-performance
Pareto curves. The application-specific run-time scheduler is then integrated
on top of the RTOS.
Modeling real-time systems
Joseph Sifakis,
Verimag, France
A major difference between general purpose and application specific
multi-processors is the specialization and combination of architecture
components to heterogeneous architectures. When going from single processor
SOC to multi-processor SOCs, global control and data flow become central
issues. On the software side, this trend is complemented by a combination
of possibly different local scheduling algorithms which must be coordinated
to control, e.g., timing and buffer sizing.
This presentation will start with an introduction to typical hardware
specialization and global flow techniques and give a few examples of commercial
hardware platforms. Then, an approach to global flow and timing analysis
of heterogeneous multi-processors is presented. The presentation concludes
with a summary of research topics in this field.
The functionality of innovative smart products relies on the availability of extremely high-performance, low-cost embedded computer systems made possible by system-on-chip (SOC) levels of integration. Successful embedded computer architecture will result from carefully balancing the opportunities that SOC technology offers with market, product, and application requirements. The distinctive requirements of embedded computing will lead to significantly different computer architectures at both the system and processor levels as well as a rich diversity of off-the-shelf (OTS) and custom designs. These architectures will be considerably more special-purpose, heterogeneous, and irregular. Furthermore, the need for large numbers of custom and customizable architectures will necessitate the automation of computer architecture. SOCs are, however, a double-edged sword; the escalating cost of designing them, and of generating mask sets for them, poses a number of interesting challenges to the embedded computer architect.
Adaptive EPIC Processors and Compilation Techniques
Krishna V. Palem, Director, Center for Research on Embedded
Systems and Technology, Georgia Tech, USA.
We propose classes of microprocessors that allow application programs
to add and subtract functional units yielding a dynamically varying instruction
set interface to the running application. During the first half of the
lecture we describe this novel class of architectures, focusing on
a specific subclass called ``Adaptive Explicitly Parallel Instruction Computing''
(AEPIC) architectures whose definition represents a collection of ideas
intended to enable efficient reconfiguration of processor data-paths. While
AEPIC processor reconfiguration is affected by the executing program at
runtime, the decisions of when and how to reconfigure are determined by
the compiler and embedded in the application's executable. In the
later half, a compilation framework targeting AEPIC processors is
proposed. Several key compilation problems that need to be addressed in
order to target AEPIC processors such as partitioning, instruction synthesis,
configuration selection, resource allocation and scheduling are discussed.
Preliminary experimental results indicate the significant role architectural
features of AEPIC processors play in masking the overheads of micro-architectural
reconfiguration and improving application performance.
Systems-on-chips will enable new video processing applications, both
because they provide the computational density required to solve these
problems and because we can integrate processors, memory, and even sensors.
Single-chip image processing systems will be able to perform sophisticated
algorithms. This talk will consider the hardware architectures of single-chip
multiprocessors for video as well as the interactions between hardware
and software design for such systems.
In this presentation we describe an architectural approach to configurable
and scalable Very Long Instruction Word (VLIW) DSP core for embedded systems.
We focus our presentation on an actual experience with a specific configurable
VLIW DSP core – the Jazz DSP processor. Following an introduction to the
processor and the VLIW structure, we discuss the methodology and tools
required for application specific configuration of the processor.
Distributed embedded systems are becoming prevalent in applications
ranging from vehicles to home automation. Multi-processor systems
on a chip can be considered both as components of such systems and as distributed
embedded systems themselves. Designing such systems to be competitive,
cost-effective, and supportable over the lifecycle of both products and
companies requires a multi-disciplinary approach to architecture.
An architecture is an organized collection of components that encompasses
both behaviors and interfaces with respect to a specific abstraction approach.
The art in creating a good architecture is in knowing where to put interfaces
and identifying the right abstraction approach. Inevitably, more
than one concurrent architectural representation is needed to represent
all the important aspects of a system. At the highest level, an embedded
system architecture must provide decoupled but coordinated views of hardware,
software, communications, and control. In many applications, distinct
architectures must also be provided for human interface, maintenance/upgrade,
safety/security, validation/verification, component coordination frameworks,
and graceful degradation. And, of course, all these facets of the
system must be compatible with overarching business and industry constraints.
Unfortunately, the need for designers to decouple architectural views and
subdivide components to manage complexity can at times complicate the creation
of design tools that must use global tradeoffs in their quest to optimize
cost and performance. Thus, there is often an inherent tension between
optimization and architectural cleanliness when creating highly complex
systems.
This 90-minute tutorial will discuss the different types of architectural
system views, common design patterns used for each architectural type,
and the benefits of using a multi-architecture approach to representing
and understanding systems. A brief discussion of recent research
results will include a report of experiences representing a distributed
embedded real-time control system in the Unified Modeling Language as well
as an architectural strategy to combine product family design and self-reconfiguration
to achieve graceful degradation.
Architectures for embedded Systems on Silicon will drastically change
in the near future. Deep submicron effects will make the use of buses increasingly
difficult. The speed of the circuits will no longer be dominated by the
gate delay but by the interconnect delay. The amount of circuits
we can integrate will be limited by power dissipation and not by area.
The use of embedded software will increase rapidly.
In order to deal with these problems a design approach is proposed
where we make a distinction between two levels of design: component level
and system level. Components are embedded cores with different levels of
programmability. They are available as reusable IP blocks which leads to
reduced design times. These cores can be function specific which leads
to cost effective implementations.
At the system level these cores fit into a communication infrastructure
or backbone. Coarse level reconfiguration is used to provide the required
flexibility at this level. A roadmap for the backbone will be presented.
Circuit switching and packet switching will both be supported. A protocol
stack including dynamic task graphs creation is briefly discussed. This
is illustrated with a silicon design example which implements multiple
video windows on a TV set.
Today's telecommunication systems require specific intelligence embedded
in complex optimized system architectures. Performance and
flexibility are the two major requirements for success, but result
quite often in many architectural trade-offs with huge impact on final
profits. Excellent system knowledge combined with in-depth analysis
methods is mandatory, and need to be supported with efficient methods
for system, hardware and embedded real-time software design. Moreover,
these different domains need to be considered together. A case study
will illustrate the concepts.
Spurred by technology leading to the availability of millions of gates
per chip, system-on-chip integration is evolving as a new IC manufacturing
paradigm, allowing entire systems to be built on a single chip. Test strategy
for System-on-Chip remains the most critical challenge. This presentation
will cover the state-of-the-art in system-on-chip test strategies, mainly
concentrating on the current industrial practices in testing such chips.
It will specifically discuees the requirements for designing embedded test
for individual cores, hierarchical test reuse, vectorless sign-off,
test interface standardization (IEEE P1500), embedded test management
and design integration for System-on-Chip, and test resource partitioning.
Architecture and Implementation of Application-Specific Multi-processor
SOCs for Digital TV (DTV) and Media-Processing Applications
Santanu
Dutta, Philips Semiconductors, USA
This talk will look at the architectural and technology trends of future system-on-chips (SoCs), and analyze their implications on testing such nano-chips. It will investigate the need and directions for paradigm shifts in test methodologies to enable us to realize the full potential of GHz performance, low-voltage/low-power deep sub-micron (DSM) SoCs. It will also explore new test paradigms to support the productivity improvements of new SoC design paradigms including design reuse and programmable/platform-based system-on-chips. The new test paradigms discussed will include software-based self-testing of the components and interconnects of programmable SoCs; fault modeling and self-test techniques for DSM noise effects; techniques for self-diagnosis and self-repair; and test reuse and composition for efficient testing of platform-based SoCs.