LECTURES
Program-In, Chip-Out: The HP Labs Automatic
Hardware Synthesis Project
Rob Schreiber, HP Labs.,USA
Chip designers are called on to design an ever larger variety of low-cost,
low-power, embedded ASICs able to process high-bandwidth multimedia data
streams. Many of these ASICs use custom hardware accelerators for
the computational "hot spots." Design time, cost, energy consumption,
and performance are important in these designs. As Moore's Law and
the growth of the industry outstrip our ability to create and debug such
designs by hand, automatic ASIC design has become a hot topic.
In order to reduce design time and design cost, the HP Labs Program-In,
Chip-Out (PICO) project focuses on automatic design from high-level specifications.
Source code (in a subset of C) for a performance-critical loop nest is
used as a behavioral specification. The PICO system compiles the source
code into a custom hardware design in the form of a parallel, special-purpose
processor array. The user, or a design exploration tool, specifies
the number of processing elements and their performance. The system
produces the array, its local RAM, its control logic, its interface to
memory, and its interface to a host processor. PICO also modifies
the user's application software to make use of the generated accelerator.
In experimental comparisons, PICO designs are slightly more costly than
hand-designed accelerators with the same performance.
In this talk, we give an overview of PICO, and describe practical solutions
to some subproblems that play a major role -- tiling, scheduling, bitwidth
analysis, data path synthesis, local memory management, and estimation
of the number of array elements referenced in a loop nest.
Component-based Design for Multiprocessor SoC
Ahmed
Jerraya, TIMA Laboratory, France
Application-specific multiprocessor system-on-chip requires the integration
of heterogeneous components including processors (DSP and microcontrollers),
memories, peripherals (DMA, interrupt controllers, …) and sophisticated
communication network. This lecture explores a high-level component-based
methodology and design environment for application-specific multiprocessor
SoC architectures. The system specification is a virtual architecture annotated
with configuration parameters described in a SystemC like model. The component-based
design environment has automatic wrapper-generation tools able to synthesize
hardware interfaces, device drivers, and operating systems that implement
a high-level API. The experiment of this approach shows a drastic reduction
of design time without any significant loss of efficiency in the final
design.
A methodology for Design Space Exploration
of on-chip networks
Luciano Lavagno,
Politecnico di Torino, Italy
This presentation will introduce a design methodology for architectural
exploration of networks on chip based on decoupling functionality from
architecture, and computation from communication. A simple example of various
mapping and refinement options for a single function-to-function token-based
communication will be used to practically illustrate the basic concepts.
A larger realistic on-chip network will then be used to describe what
sort of information can be obtained by various mapping and performance
analysis experiments. Although some specific tools will be used to make
the explanation more concrete, the methodology is fully tool-independent,
and can be implemented on top of several publicly available design frameworks.
A methodology and architecture for Network
processor
Faraydon Karim, STMicroelectronics Inc., USA
This talk first goes through the stages of SoC development specifically
the network processor. Then lays down a methodology to speed up the research
and development stages. Then describes an architecture that best suits
SoC development from aspects of R&D.
Automated Processor Generation for System-on-Chip
Chris Rowen, Tensilica, USA
New application-focused system-on-chip platforms motivate new application-specific
processors. Extensible processor architectures offer the efficiency of
tuned logic solutions with the flexibility of standard high-level programming
methodology. Automated extension of processor function units and
the associated software environment – compilers, debuggers, simulators
and real-time operating systems – satisfies these needs. At the same
time, designing at the level of software and instruction set architecture
significantly shortens the design cycle and reduces verification effort
and risk. This lecture describes the key dimensions of extensibility within
the processor architecture, the instruction set extension description language
and the means of automatically extending the software environment from
that description. It also outlines the essential methods to combine
these tiny, fast processors into effective system-on-chip solutions, displacing
both traditional processors and traditional RTL-based logic blocks.
Managing dynamic real-time tasks for modern
multi-media systems on multi-processor platforms
Rudy Lauwereins, IMEC & K.U.Leuven,
Belgium
Run-time task scheduling on a multiprocessor platform forms a real
challenge for real-time embedded systems, where also costs like energy
consumption are of major concern. This problem will be illustrated
first in the context of state-of-the-art embedded multi-media systems that
are becoming more and more dynamic due to e.g. QoS issues. These applications
also require high-performant heterogeneous multi-processor platforms to
achieve real-time. Typically the run-time schedules for such dynamic systems
are determined by the RTOS. Experience shows however that this is
not effective for keeping the energy or memory footprint low. This
cost-sensitive problem formulation has also not been considered in the
traditional dynamic scheduling research. The approach proposed here, intends
to combine the advantages of the design-time scheduling phase where all
the necessary information is collected and prepared, and the flexibility
of a dynamic scheduling phase that exploits this information. This approach
allows to optimize the system energy consumption at run time based on precomputed
cost-performance Pareto curves. The application-specific run-time scheduler
is then integrated on top of the RTOS. Application results will demonstrate
the feasibility of collecting these Pareto curves for realistic applications
(based on prototype tools that have been developed). Also the impact of
the run-time phase will be illustrated.
MP-SoC modeling: A Software point of view
Marcello Coppola, STMicroelectronics,
France
Multi-Processor-System-on-Chip (MP-SOC) is a set of components, embedded
on the same chip, which collaborates to achieve a common purpose. The design
flow for such system is a key point for companies as STM, because a good
flow may reduce the gap between productivity and time-to-silicon. The necessity
to have several abstraction levels for the software and hardware is key
point for such design flows. Today, hardware modeling is well covered
and understood but there is a lack in modeling for software components.
This talk, first of all introduces which are the software components that
we can found in MP-SOC, then a comprehensive description for software components
will be given. Finally a modeling methodology based on SystemC for such
components is described. Two small examples conclude the talk.
Communication as the backbone for a well balanced
system design
Eric Verhulst, Eonic Solutions
GmbH, Germany
In a multi-processing environment a communication primitive is the
logical equivalent of a local assignment. This means that in such an environment
the communication functions are as important as the processing functions
as found on the processors themselves to achieve a well balanced system
design. The underlying programming model is offered by CSP (Communicating
Sequential Processes) of C.A.R Hoare. This model also allows an early modeling
of the final application that can be mapped onto adequately design hardware.
A step further is achieved by using reprogrammable logic to build an active
communication backbone. Such a model is found in the CSPA (Communicating
Signal Processing Architecture) developed by Eonic for board level DSP
systems. It allows to further maximize the CPU cycles available for processing
while putting regular dataflow type processing in the data-streams. The
result is a system architecture that is well balanced, more scalable and
more efficient as it eliminates many of the bottlenecks that bus based
architectures exhibit.
Scheduling of Multi-process System Specifications
Jan Madsen, Informatics
and Mathematical Modelling, Technical University of Denmark, Denmark
A multi-process system specification exhibits significant non-determinism,
as only a partial ordering is specified. The behavior in terms of produced
output, will be the same regardless of internal behavior or implementation
details. However, the actual ordering and choice of implementation may
influence a number of other parameters such as performance, power consumption,
flexibility, reliability, production cost, etc. These parameters have to
be captured and explored in order to produce a successful SoC design.
In this lecture we will focus on the scheduling problem. We will introduce
scheduling terminology and basic concepts, based on classical real-time
scheduling for uni-processor systems. These are then extended to handle
shared resources, data dependencies, context switching, cache effects and
power reduction, which are all relevant for SOC. Finally, we will extend
it to the problem of multi-processor scheduling.
Trends and Requirements for Network Processor
SoC tools
Pierre Paulin, SoC Platform Automation,
STMicroelectronics, Central R&D, Canada
This talk first presents the spec of an 'ideal' NPU embedded S/W and
SoC tool environment, based on our interactions with NPU users in telecom
system houses. We then present a survey of current SoC tool environments
for leading Network Processor Unit (NPU) vendors. We highlight the technical
R&D challenges related to multi-processor S/W development and analysis.
We present 'StepNP', an experimental NPU R&D platform, which serves
as the basis for NPU SoC tool and methodology exploration. We conclude
with a description of our R&D in NPU SoC tools.
Datapath width optimization for customizable
Processors
Hiroto Yasuura, Graduate School of
Information science and Electrical Engineering, Kyushu University, Japan
The datapath width strongly affects system performance, chip area,
and energy consumption of an SoC. For a given set of applications,
we can optimize datapath width of each components on the SoC. In this talk,
the datapath width optimization problem is defined and several optimization
techniques and tools for the optimization are introduced. The bitwidth
analysis gives information on required datapath width. Soft-core
processor is a customizable processor whose datapath width is parameterized.
Valen-C and its compiler provide programming environment for Soft-core
processor. A dynamic control approach for the datapath width is also presented.
Several optimization methods for area minimization and energy minimization
are persented with experimental results. These optimization methods significantly
reduces area and power consumption of processors and memories in SoC, preserving
system performance. We also propose a new design paradigm, called quality-driven
design, which trades the computational quality (e.g., quality of video
output) with performance, cost and energy/power.
Embedded System Architecture: a multi-faceted
view
Philip Koopman, Carnegie
Mellon University, Pittsburgh, Pennsylvania, USA
Distributed embedded systems are becoming prevalent in applications
ranging from vehicles to home automation. Multi-processor systems
on a chip can be considered both as components of such systems and as distributed
embedded systems themselves. Designing such systems to be competitive,
cost-effective, and supportable over the lifecycle of both products and
companies requires a multi-disciplinary approach to architecture.
An architecture is an organized collection of components that encompasses
both behaviors and interfaces with respect to a specific abstraction approach.
The art in creating a good architecture is in knowing where to put interfaces
and identifying the right abstraction approach. Inevitably, more
than one concurrent architectural representation is needed to represent
all the important aspects of a system. At the highest level, an embedded
system architecture must provide decoupled but coordinated views of hardware,
software, communications, and control. In many applications, distinct
architectures must also be provided for human interface, maintenance/upgrade,
safety/security, validation/verification, component coordination frameworks,
and graceful degradation. And, of course, all these facets of the
system must be compatible with overarching business and industry constraints.
Unfortunately, the need for designers to decouple architectural views and
subdivide components to manage complexity can at times complicate the creation
of design tools that must use global tradeoffs in their quest to optimize
cost and performance. Thus, there is often an inherent tension between
optimization and architectural cleanliness when creating highly complex
systems.
This talk will discuss the different types of architectural system
views, common design patterns used for each architectural type, and the
benefits of using a multi-architecture approach to representing and understanding
systems. A brief discussion of recent research results will include
a report of experiences representing a distributed embedded real-time control
system in the Unified Modeling Language as well as progress in creating
a unified approach to achieving graceful degradation for distributed embedded
systems.
Multiprocessor Architectures for Signal Processing
Applications
Peter Pirsch,
University of Hannover, Germany
Signal processing applications with their real-time constraints have
caused a still growing demand for processing power and data throughput
in today's DSP architectures. Implementations ranging from digital video/audio
equipment to portable devices such as cellular phones and personal digital
assistants have to deal with both, supporting a variety of increasingly
complex algorithms and meeting low power contraints within compact realizations.
Multiprocessor architectures address these requirements by mapping processing
schemes onto a set of application-specific processor cores on a single
chip.
This lecture will cover following topics, including several examples: