|
|
Monday August 3:
MPSOC application day
|
SESSION 1: Keynote
SESSION 2: Mini-Keynotes
-
Norbert Wehn,
University of Kaiserslautern, Germany
Energy Modeling and Optimization: A
Critical Assessment with Two Case Studies
Energy efficiency is a key challenge in embedded
system design. Thus, many sophisticated optimization techniques were
developed over the last decades. These techniques typically rely on
models representing the system's energy consumption. In this talk we
will present two case studies which show that commonly used energy
models can yield in wrong optimization strategies. In detail we will
discuss: Short hop versus long hop in wireless sensor networks and DRAM
power management.
-
Torsten Kempf, RWTH Aachen University, Germany
A design methodology for SDR (Software Defined Radios) reconciling portability with efficiency
-
Soo-Ik Chae,
Seoul National University, Korea
Flexible multi-core platform using multple RISC clusters for multi-standard video applications
First, I will introduce a RISC cluster that has up to four RISC cores with shared I/D caches and a hardware thread scheduler for muti-threading as well as coprocessors for computation and communication. Then I will describe a high-perofrmance video platform using multiple RISC clusters and its design flow that maps a video application to the platform with architecture-level DSE. I will also show some preliminary data about a maping result for a H.264 720p decoder.
-
Pieter Van der
Wolf, NXP, Netherlands
The Memory Bottleneck in MPSoCs for Multimedia
MPSoCs for multimedia have high demands on their infrastructures, due to requirements for high bandwidth, low latency, and quality-of-service guarantees. We specifically focus on the bottleneck to off-chip DRAM and explain how it impacts overall system cost. We detail the requirements of different kinds of processing engines on a DRAM controller and discuss the integration challenge for heterogeneous MPSoCs for multimedia. 3D integration of logic and memory may create a paradigm shift, as it could make abundant bandwidth available. We will discuss some of the challenges for adoption of such 3D integration techniques.
-
Gerhard
Fettweis,Technical University Dresden, Germany
Designing an LTE Baseband MPSoC
SESSION 3: In-depth
technical presentations
-
Kasutoshi
Wakabayashi, NEC, Japan
-
Suzane
Lesecq, CEA-LETI, France
Sensor fusion with intermittent
data, application to attitude estimation
Observers are classically used to fuse information from various sensor modalities in order to estimate other variables that are not directly measurable by sensors. The implementation of these techniques leads to the so-called "soft sensor". However, due to the introduction of networks between the sensors and the computational unit, the design of observers that can support data loss must be considered. Moreover, distributed versions of these observers must be implemented. This talk proposes a review of various techniques that can be found in the literature to take into account the data loss. An Extended Kalman Filter that takes into account the loss of a subset of the measurements is proposed. The technique is exemplified in the context of attitude estimation with measurements from an Inertial Measurement Unit.
-
Chris Rowen ,
Tensilica, USA
Energy-Efficient LTE Baseband with
Extensible Dataplane Processor Units
This paper outlines a new generation of new processor generation technology tuned to the demands of multiple dataplane processing units in baseband communications, video post-processing, audio and other cost- and energy-obsessed SOC applications. We highlight improved interface and computing structures and new tools for integration with MatLab models and RTL blocks. Then we apply these methods to integration of a complete programmable140Mbps LTE receiver architecture, built using a new DSP architecture with optimizations for OFDM, MIMO decoding and forward error correction. We conclude with some insights into future directions for application-specific processor cores, system architectures and methodology.
-
Nicolas Darbel, STEricsson, France
U8500 will fuel the growth of Web-enabled multimedia handheld devices and enable smartphones to be widely adopted by consumers
This Single Chip Digital BaseBand and Application processor combines leading edge features for multimedia, computing and communication such as: real time capture & playback of 1080p30 video, HSPA modem release 7, high performance computing based on SMP dual ARM Cortex A9 multicore processor, supported by state-of-the-art architectural concepts such as Globally Asynchronous Locally Synchronous interconnect and SW controlled advanced power management, in advanced 45/40nm process technology and POP packaging.
-
Guy Bois,
Ecole Polytechnique Montréal, Canada
Exploration, Design and Development
of Hardware & Software Multi-Processor Embedded Systems Made
Easier
Market forces are driving electronic system developers to design ever more
complex products whose architectures are based on a large and growing mix of
multi-processors, software, complex peripherals and intellectual property
cores. This challenge has given rise to the need for new design methods
that provides the ability to address both software and hardware development
simultaneously, allowing full optimization in each domain. In this context,
we present SpaceStudio a system level development environment and tool
suite that allows hardware/software architectural exploration and
partitioning, design, validation, simulation, performance analysis and
integration of mixed HW/SW embedded systems.
-
Wen-mei Hwu,
Illinois University, USA
Many-core Parallel Computing - Can compilers and tools do the heavy lifting?
Modern GPUs such as the NVIDIA GeForce GTX280, ATI Radeon 4860, and the
upcoming Intel Larrabee are massively parallel, many-core processors. Today,
application developers for these many-core chips are reporting 10X-100X
speedup over sequential code on traditional microprocessors. According
to the semiconductor industry roadmap, these processors could scale up
to over 1,000X speedup over single cores by the end of the year 2016.
Such a dramatic performance difference between parallel and sequential
execution will motivate an increasing number of developers to parallelize
their applications. Today, an application programmer has to understand
the desirable parallel programming idioms, manually work around potential
hardware performance pitfalls, and restructure their application design
in order to achieve their performance objectives on many-core processors.
Although many researchers have given up on parallelizing compilers, I will
show evidence that by systematically incorporating high-level application
design knowledge into the source code, a new generation of compilers and
tools can take over the heavy lifting in developing and tuning parallel
applications. I will also discuss roadblocks whose removal will require
innovations from the entire research community.
Tuesday August 4:
Software day |
SESSION 4: Keynote
-
Osamu Nishii,
Renesas Technology, Japan
Concurrent trace hardware of
multi-core
Trace hardware is attached to processors and it enables an
observality on software debugging. Real-time trace function is
used to trace without processing speed degradation, full trace
function is used to trace information without information loss.
SuperH multi-processor (SH-X3) supports both trace modes, and it
improves trace utilization ratio using an adaptive trace buffer
allocation technique.
SESSION 5: Mini-Keynotes
-
Jan Madsen,
Technical University of Denmark, Denmark
Mapping bio-chemical applications onto microfluidic-based biochips
Microfluidic biochips are replacing the conventional biochemical analyzers, and are able to integrate on-chip all the necessary functions for biochemical analysis.
The "digital" microfluidic biochips are manipulating liquids not as a continuous flow, but as discrete droplets, and hence are highly ireconfigurable and scalable.
A digital biochip is composed of a two-dimensional array of cells, together with reservoirs for storing the samples and reagents. Several adjacent cells are dynamically grouped to form a virtual device, on which operations are executed. This talk will present the possibilities and challenges of mapping bio-chemical applications onto microfluidic biochips. It will discuss the similarities and new challenges as compared to on-line dynamic reconfigurability of digital reconfigurable multicore architectures.
-
Marcello
Coppola , STMicroelectronics, France
Future Trends in communication infrastructures for MPSoC
Current SoC architectures have to support several applications with high requirements in term of bandwidth and latency which implies the use of heterogeneous multi core architectures in a single chip (MPSoCs) endowed with complex communication infrastructures, such as networks on chip (NoCs). So far, NoC has been considered as not programmable hardware component able to support the overall MPSoC bandwidth and latency requirements. In this talk we will show how important is the introduction of programmability in NoCs.
-
Mostapha Aboulhamid
, Université de Montréal, Canada
System Modeling and Multicore Simulation using the Transaction Paradigm
The transaction concept is a powerful model used to describe in a simple way the coherent execution of simultaneous process activities in concurrent systems. Indeed, a transaction allows encapsulating a set of operations which therefore behaves as a single atomic operation.
In this way, H/S system functionality can be modeled as a set of transactions, where each transaction can be dealt with in isolation (from other transactions). Hence, the designer does not need any longer to worry about shared resources consistency and the coordination of concurrent modules accessing them, since all these matters are being automatically taken care of by the transaction manager implemented by the underlying simulator.
In order to simulate a transaction-based model, we need to implement a Software Transactional Memory (STM). An STM model consists of a set of processes which communicate through a shared memory by executing transactions. We will give some hints on the correctness criteria used to validate the parallel execution of transactions. Then, we give an overview of the different techniques and policies used in the implementation of an STM. Finally, we show different implementation avenues using the .NET framework and a comparison with the SystemC approach.
In our comparison with SystemC, we show the potential of accelerating simulation using multicore hosts and the advantage of using the transaction paradigm in modeling as well as in design space exploration at a system abstraction level.
SESSION 6: In-depth
technical presentations
Wednesday August
5: Architecture day
|
SESSION 8: Keynote
-
Pierre
Paulin, STMicroelectronics, Canada Real-Life Challenges on Mapping High-end Video to MP-SoC
The increasing need for flexibility in multimedia SoCs for consumer
applications is leading to a new class of programmable, multi-processor
platforms. The high computation and data storage requirements of these
applications poses new challenges in the expression of the applications, the
multi-core fabric to support them and the aplication-to-platform mapping
tools.
We elaborate on these challenges, and illustrate them with our experience on
mapping high-definition video image quality improvement and codec
applications to MP-SoC platforms.
SESSION 9: Mini-Keynotes
-
Charlie
Janac, Arteris, USA
The Network-On-Chip as a Key to Flexible SoC Architectures
Ever increasing SoC power, performance and area requirements are forcing
on-chip interconnect technology to keep pace. In a first generation, a
hierarchy of simple buses and crossbars were sufficient to meet the needs of
SoCs. Subsequently, the need for IP re-use resulted in a second generation
consisting of de-coupled busses and crossbars, separating the IP interface
protocol from the interconnect protocol. The Network-On-Chip(NoC) represents
the current generation of on-chip interconnect technology. By decoupling
transaction, transport and physical layers, the NoC delivers a flexible
topology architecture, resulting in lower power, faster and smaller
interconnect IPs, thereby providing SoC designers with the needed
flexibility to achieve performance and cost requirements.
-
Ahmed
Jerraya, CEA-LETI, MINATEC, France
SoC integration beyond Hardware and Software
SoC integration has already made fantastic achievements to produce smaller,
cheaper, more reliable, better performing devices with more and more functionalities.
This presentation shows how this trend is expected to continue for the next 10 years
because many applications still need 100X improvement factor that can be achieved
only through integration.
-
Drew Wingard,
Sonics, USA
-
Srinivasan
Muralli, Inocs, Switzerland
NoC Tool Flow for Achieving Fast Design Closure
The growing complexity of Systems on Chips (SoCs) is requiring communication resources that can only be provided by a highly-scalable Networks on Chip (NoC) based communication infrastructure. Developing NoC-based systems tailored to a particular application domain is important for achieving high-performance, energy-efficient customized solutions. To achieve early time-to-market, it is important to have a CAD tool flow that automates most of the time-intensive design steps. In this talk, I will show how a CAD flow is crucial in solving the NoC design problem efficiently and for achieving design closure.
-
John Goodacre, ARM, UK
Targeted execution enabling increased power efficiency
In many low power designs there is a diversity in performance requirement both between concurrent task activity and their temporal requirement. It has been observed for numerous generation of processor design that there exists a variance in the power efficiency for a given microarchitectual design. It is therefore proposed that to minimize the energy consumption of a device, utilizing a processor of a specific microarchitectural structure can offer the most power efficient execution of specific applicable workloads. As such a unified ISA, heterogeneous multicore design will provide an associated energy reduction. This talk will summarize some of the approaches for this and potential software techniques required to support such systems.
SESSION 10: In-depth
technical presentations
-
Scott Mahlke,
Michigan University, USA
High Performance Mobile Computing Using Flexible Wide SIMD Processors
In the past decade, the proliferation of mobile devices has increased
at a spectacular rate. There are now more than 3.3 billion active cell
phones in the world, a device that we now all depend on in our daily
lives. The current generation of devices employs a combination of
general-purpose processors, digital signal processors, and hardwired
accelerators to provide giga-operations-per-second performance on
milliWatt power budgets. Such heterogeneous organizations are
inefficient to build and maintain, as well as waste silicon area and
power. Looking forward to the next generation of mobile computing,
computation requirements will increase by one to three orders of
magnitude due to higher data rates, increased complexity algorithms,
and greater computation diversity but the power requirements will be
just as
stringent. Scaling of existing approaches will not suffice, instead
the inherent computational efficiency, programmability, and
adaptability of the hardware must change. To overcome these
challenges, this talk argues that wide SIMD processors provide the
necessary computational bandwidth for the next generation wireless
signal processing and high-definition video algorithms. However,
simply scaling old SIMD designs is not enough. Wide SIMD processors
need more inherent flexibility to adapt to different application
characteristics and better power efficiency to avoid unnecessary use
of large structures. This work introduces AnySP, a highly
configurable SIMD datapath that is adaptable to a wide range of signal
and media processing applications. AnySP maintains high utilization
of a wide SIMD datapath by enabling the processing of wide vectors or
multiple narrow vectors simultaneously as well as pipeling deeper
computation subgraphs across neighboring lanes. Results show that
AnySP is capable of sustaining 4G wireless processing and
high-definition video throughput rates, and will approach the 1000
Mops/mW efficiency barrier when scaled to 45nm.
-
Tom Conte,
Georgia Institute of Technology, USA
Manycores: will we learn from the
past?
-
Sanjay Patel,
Illinois University, USA
Scaling to 1000 cores on a chip:
Architecture and Application
Chip architectures such as Nvidia G80 initiated the era of massively
parallel general purpose computing on the client. Fueling the economic fire
for such high-performance chips are interactive, client application domains
such as gaming that are hungry for performance. Emerging applications in
vision, imaging, video processing, virtual immersion, and robotics also have
an insatiable need for speed, and provide a future performance roadmap for
such many-core chips.
In the Rigel Project, we are developing a scalable architecture with 1000s
of cores, and many TFLOPS of peak performance. Rigel has a well-defined and
general purpose programmer interface that enables a broad class of task and
data parallel applications to be mapped efficiently to the chip. In this
talk I will describe the major results of the project thus far, touching on
subjects such as scalable cache coherence through hardware and software, the
Rigel task-based parallel programming model, area-power-performance
tradeoffs for throughput-oriented architectures, and parallel programming
tools.
-
Tohru
Ishihara, Kyushu University, Japan
Real-Time Dynamic Voltage Hopping on MPSoCs
-
Hiroyuki
Tomiyama, Nagoya University, Japan
Real-Time Operating Systems for MPSoC
In multiprocessor systems, it is very difficult to guarantee and optimize the real-time responsiveness of application tasks and interrupt handlers, and the real-time responsiveness is significantly affected by the scheduling policy and internal structure of the real-time operating systems. This talk will discuss real-time issues in MPSoC, and present real-time operating systems which we have developed and released as open-source software.
-
Fredrik
Dahlgren, ST-Ericsson
Technological Trends, Design Constraints and Architectural Challenges in
Mobile Phone Platforms
A large number of capabilities are being integrated into mobile phones.
The multimedia requirements are similar to those of dedicated camcorders
and cameras one generation earlier. The application framework and
software architectures are rapidly becoming similar to the desktop and
laptop environments, and the network access performance is driven by the
trends of mobile broadband. The high volumes push the employment of the
very latest silicon and packaging technologies. However, there are
extremely challenging and conflicting requirements with respect to
performance, cost, power consumption, flexibility, and software
compatibility, which greatly impact the system architecture.
This presentation aims at surveying current market and technological
trends. Emphasis is on the processing needs of the different functions
of mobile phone platforms, and some architectural tradeoffs from a
multi-core perspective.
Thursday
August 6: Advanced Application day
|
SESSION
11: Keynote
SESSION 12:
Mini-Keynotes
SESSION 13: In-depth
technical presentations
-
Kees Goossens, NXP, Netherlands
Hardwired Networks on FPGAs and their applications
In this talk we discuss the role networks on chip can play in FPGAs. In particular, hardwiring them in the silicon gives improved communication performance, at (limited) loss of flexibility. More importantly, unifying the interconnect for data, control, and code (bitstreams) enables new applications.
-
Yuan Xie,
Pennsylvania State University, USA
Enabling Many-Core Design via 3D
Stacking
As fabrication of 3D integrated circuits has become viable, developing CAD tools and circuit/architectural techniques for future MPSOC design are imperative. In this talk, a brief introduction on 3D IC integration technology will be given, and the challenges that can enable the adoption of 3D MPSOC will be discussed, and novel 3D architectures will be introduced.
More information can be found here
-
Takeuchi
Yoshinori, Osaka University, Japan
Simulator Generation Method of
Configurable Processors for MPSoC
In multi processor SoC (MPSoC) era, design exploration requires a vast
amount of time. MPSoC includes multi- or many-processors, a lot of
dedicated HWs, and communication buses.
We propose a simulator generation method combining configurable processor
developing environment and high abstraction level bus model.
-
Kiyoung Choi, Seoul National University, Korea
A Reconfigurable MP-SoC Architecture and Application Mapping
With the ever increasing requirements for more flexibility and higher
performance in embedded systems design, coarse-grained reconfigurable array
has drawn much attraction. But most of the existing architectures cannot be
used for applications such as 3D graphics that require floating-point
operations. Moreover, mapping applications on the reconfigurable array is
difficult since it needs to solve multiple problems simultaneously including
compilation of the application and configuration of the architecture with
limited routing resources. This talk presents how to perform floating-point
operations on a coarse-grained reconfigurable array of integer processing
elements. It also presents how to map applications on the array considering
limited routing resources.
-
Alain Greiner, LIP6-ASIM, France
TSAR : a scalable, shared memory, many-cores architecture with global cache coherence
We present the TSAR many-cores Architecture (TSAR stands for Tera Scale ARchitecture).
It is an European Medea+ project, driven by the BULL company.
This architecture supports a cache-coherent, shared memory, scalable up to 4096 32 bits cores.
We detail the distributed, hybrid, cache coherence protocol, and some preliminary
experimental results based on the virtual prototyping of the first version of the architecture.
Friday
August 7: Advanced Research day
|
SESSION
14: Keynote
-
Janos
Sztipanovits, Vanderbilt University, USA Compositionality and High-Confidence Design
Cyber-Physical Systems (CPS) are inherently heterogeneous in structure, components, and types of interactions among components. Existing compositional frameworks address separate design concerns and neglect their interactions. The result is weakened or lost composability with effects cropping up during system integration. For example, when designing embedded controller dynamics, both the physical platform and safety/security related design decisions need to be considered. Physical platform properties that are relevant for controller dynamics include timing uncertainties caused by the communication network, CPU dynamic power management that affects performance, jitter caused by the schedulers, value uncertainties caused by quantization, and finite precision arithmetic inaccuracies. Safety and security related architecture properties that are relevant for controller dynamics include timing uncertainties caused by fault management, architecture adaptation, intrusion detection algorithms, encryption/decryption of data in communication channels, the use of separation kernels and virtualization. Composability of fundamental properties of controller dynamics, such as performance and stability requires that we precisely manage cross-layer interactions. In this talk we will examine three different techniques for improving composability, each based on the utilization of cross-layer abstractions: (1) Passivity-based design of controller dynamics that can be leveraged to orthogonalize controller stability from implementation uncertainties, (2) Cross-layer abstractions that can be used for making controller dynamics robust against selected implementation properties inside an operation regime and (3) Decoupling controller dynamics from implementation uncertainties by introducing a specification layer for timing properties that can be guaranteed during execution.
SESSION 15:
Mini-Keynotes
-
Omar Hammami,
ENSTA, France
Automatic MPSOC Generation and
Design Space Exploration from Automatic Parallelizers
Multiprocessor systems on chip require an integrated design approach combining software-architecture-implementation mutual
effects analysis. In this work we present a design flow integrating automatic parallelization based generation of multiprocessor
system on chip coupled with automatic large scale emulator based multiobjective design space exploration. The powerful combination
of parallelization and design space exploration open new design avenues undisclosed by decoupled analysis.
-
Lars Bauer
, University of Karlsruhe, Germany
Classifying and Evaluating Performance-relevant Parameters for Reconfigurable Processors
Processors that deploy fine-grained reconfigurable fabrics to implement application-specific accelerators on-demand obtained significant attention within the last decade. They trade-off the flexibility of general-purpose processors with the performance of application-specific circuits without tailoring the processor towards a specific application domain like Application Specific Instruction Set Processors (ASIPs).
Vast amounts of reconfigurable processors have been proposed, differing in multifarious architectural decisions. However, it has been an open question, which of the proposed concepts is more efficient in certain application and/or parameter scenarios. We have developed a comprehensive design space exploration tool that allows to systematically explore diverse reconfigurable processors and architectural parameters.
In this talk I will present a classification of fine-grained reconfigurable processors with their relevant parameters and focus on a systematic design space exploration across diverse reconfigurable processor concepts with the aim to aid a designer of a reconfigurable processor.
-
Ulrich
Ramacher, Infineon,
-
Rolf Ernst, Technical
University of Braunschweig, Germany
MpSoC in safety related applications
MpSoC are on their way into safety related
applications. Examples are automotive and aerospace control. There are
two usage scenarios. Previously distributed control units (ECUs) and
applications are merged onto one MpSoC to save cost but keep a certain
degree of isolation. Of particular interest are mixed criticality
systems that combine functions with different safety requirements on
one MpSoC leading to diverging design constraints and objectives. A
second MpSoC usage is load distribution of a task system to increase
performance and reduce power consumption of a single application, such
as drive control. Here, the classical shared memory communication of
multirate periodic systems leads to issues of synchronization and
indirect blocking that make efficient design difficult. The talk will
give an overview of challenges and potential solutions.
-
David Atienza ,
EPFL, Switzerland
Thermal Modeling and Active Cooling
for 3D MPSoCs
Continuous technical advances in manufacturing
technologies are fueling the trend towards more powerful 3D
Multi-Processor System-on-Chip (MPSoC) designs. However, 3D stacking
creates additional manufacturing steps beyond the standard technology
ones due to the high power density resulting from the placement of
computational units on top of each other. Therefore, the power
densities in 3D stacks will increase heat density, leading to degraded
performance if thermal-aware design and thermal management are not
handled properly. For instance, one of the novel cooling proposals is
to use water flowing through liquid microchannels in addition to
traditional heat sinks. During this mini-keynote, I will present new
thermal modeling methods for 3D MPSoC architectures, developed in
cooperation with IBM. In particular, I will described the work
conducted on the modeling of liquid microchannels for cooling in 3D
stacks. Finally, I will briefly summarize the effort to assess the
effectiveness on 3D MPSoC architectures of dynamic thermal management
techniques developed for 2D MPSoCs, an discuss possible ideas to devise
new runtime approaches that cool down the 3D chips by tuning the flow
rate of the coolant.
-
Gabriela
Nicolescu, Ecole Polytechnique de Montréal, Canada
System-Level Analysis for MPSoC Integrating Optical NoC
|