MPSoc 2008

Lectures

Printer friendly Agenda:

Monday June 23

Keynote:

Liang-Gee Chen, National Taiwan University, Taiwan
Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems
Nowadays consumer electronic applications urge the growth of semiconductor technology. The variety of multimedia applications with wide range real-time demands requests different computing architecture and computing power. The traditional Application Specific Integrated Circuit (ASIC) designers will be push into a stress state to redesign the suitable architecture to satisfy the new specification or application. Reconfigurable and scalable architecture become a leading trend for architects to unify the architecture to provide more flexibility and efficiency in design phase. In this presentation, first, the scalable features in the multimedia application will be introduced and the architecture-level reconfigurability and scalability will be discussed in the second part. The last part, several examples including the scalable multimedia features SVC standard, scalable visual processor and the reconfigurable and scalable streaming architecture for future multimedia processing will be demonstrated.

Mini-Keynotes:

Paolo Ienne, EPFL, Switzerland
Analytical Models of Communication for MPSoCs
System simulation is the prime technique used in Computer Architecture and Embedded System Design to explore potential design solutions and validate design choices. Unfortunately, simulation seldom gives real insight and strong guarantees on the dynamic behaviour of a system. Also, increasingly simulation is the design bottleneck due to the large number of parameters designers need to explore and to the complexity of the datasets which need to be simulated. In this presentation we will briefly address one alternative, namely Real-Time Calculus, which has been shown capable of modelling systems realistically and which makes it possible to zoom quickly on the most promising section of the design space.
Youn-Long Lin, National Tsing Hua University, Taiwan
Memory Access Analysis and Optimization for Ultra High Definition Video Coding
In this talk we will analyze memory bandwidth requirement and optimization techniques for Ultra High Resolution (4 times FullHD) video decoding applications. Due to H.264/AVC’s unique feature of variable-block-size, there exists great amount of redundant access. Furthermore, frame-to-memory mapping and access scheduling are inter-dependent. We show that by novel memory allocation and holistic access scheduling, it’s feasible to realize ultra high resolution in a main-stream SOC technology.
Vijaykrishnan Narayanan, Pennsylvania State, University, USA
Network-on-Chip for 3D Architectures
This talk will present various options for designing network on chips for 3D stacked chips. It will include discussion on topology choice and true-three dimensional stacking of router architectures.
Marcello Coppola, STMicroelectronics, France
Spidergon: STNoC: The Communication Infrastructure for Multiprocessor Architectures
owadays, SOC for the embedded consumer market, are customized to the consumer application. They are based on multiple hardware accelerators on the same die. Due to the application complexity and usage in final products, SOCs are evolving toward a set of heterogeneous cores with special set of instructions to speed up the application coupled with hardware accelerators that are running at a lower speed, offering the right level of performance, flexibility and size. These processors will be connected through an on-chip interconnect. In this presentation we proposes how Spidergon STNoC could be used in such architectures evolutions and some of the innovative features are illustrated.
Pier Stanislao Paolucci, ATMEL & INFN, Italy
Four Levels of Parallelism to be Managed in the DIOPSIS based SHAPES Multi-Tiled Architecture
Modern MPSoCs must integrate hundreds of million-gates to satisfy high performance demands from applications. Tiled architectures provide a scalable design style addressing two key issues: global wiring and design complexity on deep submicron technologies. The usage of a set of “small” mesochronous processing tiles (a few million gates each), with “short” intra-tile wires, solves the clocking issue. The replication of stable and validated processors and tiles maintains a manageable complexity at system level.
In our case, we address the SHAPES multi-tiled system, which is based on a multi-processor heterogeneous tile, evolution of the ATMEL DIOPSIS MPSoC. Each tile typically includes: a VLIW floating-point DSP (mAgicV Digital Signal Processor, from ATMEL), a RISC controller, a DNP (Distributed Network Processor, from INFN). Each tile can be individually connected with a DXM (Distributed External Memory), and a set of peripherals (e.g. ADC/DAC), and always includes several banks of intra-tile memory, A routing fabric connects on-chip and off-chip tiles, weaving a distributed packet switching network. 3D next-neighbours toroidal connections are adopted for off-chip networking and maximum system density.
The SW challenge is to provide a simple and efficient programming environment able to manage four levels of parallelism:
Level 1. Inside each processor: e.g. exploiting the 15 operations / clock cycle of the multiple-issue mAgicV floating-point VLIW DSP of ATMEL, and its software-managed multi-bank memory system;
Level 2. Inside each tile, e.g. activating the parallel operation of the RISC, DSP and DNP processors inside the tile, supported by a SW managed multi-layer bus matrix, which permits multiple simultaneous intra-tile data transfers between the intra-tile processors and the Distributed External Memory attached to each tile.
Level 3. Inside each multi-tile chip, managing the inter-tile DNP-NoC-DNP packet transfers, for a number of simultaneous data transfers which should grow proportionally to the number of tiles;
Level 4. At multi-chip system level, using the 3D toroidal off chip Network

In-Depth Presentation:

Jörg Henkel, University of Karlsruhe, Germany
Adaptive Embedded Processing with RISPP
Embedded systems are becoming more and more complex and typically comprise a large set of diverse functionalities. As this complexity increases, typically metrics like performance, power consumption are hard to predict at design time. Run-time adaptation can help to alleviate this problem. The talk gives an overview of RISPP, a run-time adaptive embedded processor that is capable to adjust at run-time according to non-predictable application behavior as well as to constraints that may change at run-time. Various research-relevant and challenging parts of the project are presented and discussed.
Takashi Miyamori, Toshiba, Japan
Venezia: a Scalable Multicore Subsystem for Multimedia Applications
We introduce a new scalable multi-core processor, Venezia, for the mobile multimedia platform. Venezia has two types of processors, the processor core and MPEs (Media Processing Engines). Both of them are configured by the configurable processor MeP (Media embedded Processor). Using the VLIW coprocessor extension of MeP, MPE is designed as a 3-way VLIW processor with dual 64-bit SIMD datapaths to accelerate multimedia applications such as audio and video CODECs. Unlike other multi-core multimedia processors, Venezia is organized by relatively small and low-power processors. This provides a lot of design flexibilities. By changing size of caches and number of processors, it can covers a broad range of applications from low-end (e.g. QVGA resolution) to high-end (e.g. 720p resolution).
Furthermore, Venezia’s software platform enables the binary level software compatibility and performance scalability by exploiting coarse-grain thread level parallelism.
Jason Parker, Symbian Ltd., UK
Symbian OS SMP
How Symbian OS is evolving into an SMP operating system. Benefits and costs of previous design choices. The challenge of delivering real world performance within the same battery.
René van den Berg, NXP, Netherlands
Predictable and Composable Multiprocessor Systems for Car-Entertainment. Part 1: business view
Marco Bekooij, NXP, Netherlands
Predictable and Composable Multiprocessor Systems for Car-Entertainment: Breaking Resource Dependencies
Current car radio applications include simultaneous reception of a number of analogue and digital radio streams for the passengers in the car. Furthermore, new applications are emerging, like wireless, storage, telematics, navigation and in-car video. For these applications, security aspects are becoming increasingly important. All these applications and related technologies and concepts are changing rapidly. As a result, the trend is to implement more functionality in software. This has resulted in an explosion of the software development, testing and validation effort. In order to reduce the software development effort, we foresee that predictable and composable multiprocessor systems will be necessary to reduce the software development cycle of the radio set makers. Composability provides temporal isolation such that all involved parties can develop and test their own software without the software of other parties. This will reduce the system test and integration effort and will improve the robustness of the car radio system.
Predictability and composability can be realized by breaking resources dependencies. Resource dependencies can be broken by using budget schedulers. Budget schedulers guarantee a minimum interval in a period that a resource is available. This guarantee is independent of the execution times of the task or the data traffic in the system.
Dataflow analysis is applicable for a system with budget schedulers. Because dataflow analysis can handle cyclic functional dependencies, we can prevent loss of data by testing whether there is space before data is written in a buffer.
Temporal isolation requires the use of non-workconserving budget schedulers. Therefore is the cost of temporal isolation that the slack created by other tasks is wasted. The use of non-workconserving budget schedulers enables fine-grained resource reallocation when applications are started in the system. This would not be possible if we would exclude sharing of resources.
A fully functional 5 core VLIW multiprocessor system around a network-on-chip that is mapped on a FPGA, demonstrates that predictable and composable systems are feasible.

Tuesday June 24

Keynote:

Tim Kogel, CoWare, Germany
Using the new TLM-2.0 Standard for the Creation of Virtual Platforms for ESL Design
The design and programming of complex MP-SoC platforms demands advanced ESL design methodologies for design tasks like hardware/software integration, software performance analysis, and architecture design. Virtual Platform models enable these tasks to be performed earlier (pre-RTL) in the design process allowing the MP-SoC hardware and software to be developed in parallel. The advantages are great visibility and control into the platform during software development, a shortened overall design cycle and a controlled and more formalized overall design process.
In the past, the lack of standards for the creation of Virtual Platforms has hampered the deployment of ESL design due to lack of suitable models for common MP-SoC platform components. The new SystemC TLM-2.0 standard from the Open SystemC Initiative (OSCI) is aligning the industry, enabling the creation of truly interoperable SystemC-based virtual platform models.
This presentation gives a technical overview of the key elements in TLM-2.0 and illustrates the effective creation of TLM-2.0 based virtual platforms that satisfy the simulation speed and timing accuracy requirements of the different ESL design tasks.

Mini-Keynotes:

Soonhoi Ha, Seoul National University, Korea
Embedded Software Development for MPSoC: Preliminary HOPES Experience
Last year we have proposed a noble model-based programming environment of embedded software for MPSoC, named HOPES.
By defining a common intermediate code (CIC) that is independent of the architecture, we separate modeling of the software and implementation optimized for target architecture.
In this presentation, we will present our experience of using the proposed methodology to design software for various targets that include ARM MPCore, IBM Cell processor, and a virtual prototyping system. Issues on partitioning and simulation will also be discussed
Masaharu Imai, Osaka University, Japan
Architecture Level Power Reduction Method for Configurable Processor Generation
Various level power reduction methods, such as at device level, logic level, architecture level, and software level, are applied to the processor design.
In this presentation, a power reduction method for configurable processor generation is introduced. The proposed method is based on the clock gating technique that takes advantage of the micro-architecture level information on the non-redundant activation condition of data path components, which can be applied to both scalar and VLIW processors.
Experimental results shows that the proposed method can reduce about 30-40% more power consumption than conventional methods
Hironori Kasahara, Waseda University, Japan
Compiler and API for Low Power High Performance Multicores
This talk introduces an automatic parallelizing compiler, API and a 8 core low power high performance multicore chip newly developed in Japanese METI/NEDO national project. Those technologies give us 5.8 times speedup by 8 cores against a single core for AAC encoding with automatic parallelization and 73% power reduction of real-time MPEG2 decoding by automatic power control by the compiler.
Joachim Kunkel, Synopsys, USA
MPSoC IP Integration and Interoperability Challenges
Much attention has been given to the “P” in MPSoCs – the processors and the software running on it. However, while these challenges are significant, the issue of integration and interoperability of all the IP in MPSoCs including interface and communication components cannot be underestimated.
This talk will review the different necessary views for MPSoC related IP including views for the actual implementation and integration at the system-level for architectural analysis and early software development. In addition, it will address issues around IP integration and interoperability and review potential solutions. Specifically it will address SPIRIT’s IP-XACT and SystemC’s upcoming TLM-2.0 release ensuring interoperability between different IP integration and simulation environments.
Jan Madsen, Technical University of Denmark
Adaptive Embedded Systems – Challenges of Run-Time Resource Management
Understanding and efficiently controlling the dynamic behavior of adaptive embedded systems is a challenging endavor. The challenges come from the often very complicated interplay between the application, the application mapping, and the underlying hardware architecture. With MPSoC, we have the technology to design and fabricate dynamically reconfigurable hardware platforms. However, such platforms will pose new challenges to tools and methods to efficiently explore these platforms at run-time. This talk will address some of the challenges of run-time resource management in adaptive embedded systems.

In-Depth Presentation:

Gerd Ascheid, RWTH Aachen University, Germany
MPSoC Design Space Exploration Framework
At an early stage of an MPSoC design neither the MPSoC architecture nor the application software may be available. During the design hardware, software and the task mapping are refined starting from an initial “guess”. Due to the interdependencies, for an optimum design efficiency and high quality of results this must be an interactive process. In this talk a concept for a design space exploration framework is presented which supports a seamless interactive design process. Initially “Virtual Processing Units (VPU)” may be used which can be replaced by simulation models of the actual processor cores when the design is progressing. Similarly, a generic operating system used in an early phase can be replaced by the OS of choice later. Due to a TLM-based (Transaction Level Modelling) approach, the communication architecture may be modelled at a high abstraction level initially for a high simulation speed and can seamlessly be replaced by a more accurate model for higher timing precision later. Last but not least the task execution may be modelled initially by timing estimates. During execution timing can be refined by a “micro-profiler”-based approach, applied to the application software, down to a processor specific code implementation which provides the accurate timing when combined with cycle-accurate processor models.
Katalin Popovici, Mathworks, France
Software Design and Integration for Embedded Multimedia Applications by Successive Refinement
To meet the current demanding requirements for multimedia applications, heterogeneous multiprocessor system on chip (MPSoC) architectures with dedicated communication infrastructure have become indispensable. Efficient programming of these architectures becomes a key challenge because of the high complexity of the embedded software. The software is generally organized in several stacks made of different layers. The necessary integration of the software with the hardware platform results in problems found late in the design.
In this presentation, an approach to software design is presented that mitigates the integration problems by early analysis starting from a functional model of the application. Successive refinement of software using architecture abstraction allows the study of increasingly detailed integration effects while efficiently executing the comprehensive multimedia application.
Rudy Lauwereins, IMEC, Belgium
Will 3D Stacking of ICs Enable to Continue Moore's Momentum in the 21st Century?

Wednesday June 25

Keynote:

Luca Benini, University of Bologna, Italy
3D-MPSoCs: Architectural and Design Technology Outlook
Three-dimensional (3D) manufacturing technologies are viewed as promising approaches to reduce the communication bottlenecks that are limiting scalability in multi-core architectures. However, there is no consensus on how to exploit the additional degrees of freedom in architectural design offered by three-dimensional integration. A critical point is the granularity at which vertical connectivity can be optimally exploited:
solutions ranging from elementary cell level (i.e. a gate split over multiple layers) to sub-system level (i.e. layer assignment is done at the processor-core or memory-hierarchy level) are being explored. The sweet spot critically depends on technology parameters and on design automation support. I will look in detail into this critical issue, focusing on the extension of network-on-chip architectures and design flows toward 3D technologies.

Mini-Keynotes:

Omar Hammami, ENSTA, France
Heterogeneous Embedded Systems Very Large Scale Design Space Exploration
Multiobjective Design Space Exploration is essential for MPSOC design and heterogeneity adds a significant number of design degrees of freedoms which translate in very large scale design space exploration requirements. The simulation wall represented by the prohibitive time required by the simulation of large scale heterogeneous MPSOC coupled with implementation data requirements (area, frequency) for multiobjective analysis strongly encourage the use of large scale emulation platforms in order to cope with billion cycles applications and large number of NOC connected processors. We will describe in this talk preliminary results of large scale MPSOC emulation based very large scale design space exploration. Issues related to interconnection, scaling and heterogeneity will be emphasized and their impact on the automatic flow.
Jari Nurmi, Tampere University of Technology, Finland
Silicon Café: a Heterogeneous Multi-Processor Platform based on Coffee (RISC Core)
In a software-defined terminal, efficient joint processing of receiver baseband algorithms, higher communication layer protocols, and multimedia applications is required. In order to achieve optimized performance and power consumption characteristics, a heterogeneous solution consisting of different types of processing cores, reconfigurable acceleration hardware, optimized standard blocks and memory are needed in the terminal hardware platform. The blocks have to communicate over a high-bandwidth interconnect to support streaming applications distributed on the heterogeneous system. Our solution to meet these needs is Silicon Café which is largely based on technology developed in open-source spirit at Tampere University of Technology: Coffee RISC Core, Milk floating-point unit, Cappuccino integer/floating-point core, Decaf low-complexity version of Coffee, upcoming superscalar Coffee, Butter coarse-grain reconfigurable accelerator array, and a novel hierarchical low-cost high-speed Network-on-Chip. The physical implementation relies on a tiled architecture supporting two different sizes of tiles to accommodate processing and storage blocks of varying complexity.
Atsuhiro Suga, Fujitsu Laboratories, Japan
ARPC based Heterogeneous Multi-Core Platform Suitable for an Embedded System
We propose a multi-core technology which can support to change a number of cores scalablly while minimizing software change operation in case of porting to multi-cores using existing OS and a single program.
We will talk about a multicore program platform using an Asynchronous Remote Procedure Call(A-RPC) technology in case study of Fujitsu FR-V multicore processor.
Thierry Collette, CEA LIST, France
Key Technologies for Mani Core Architectures
Even with the emergence of multicore components, the new challenge for the next decade in the embedded computing will be the design of manycore components, composed by more than 100 processors. This short presentation will focus on the key technologies which must be developed in order to address this challenge. Some advanced results of the CEA LIST on this area will be presented too.
Olivier Franza, Intel, USA
Power Mitigation Techniques in Complex MPSoCs
Various power mitigation techniques used in large-scale complex multi-core microprocessors will be presented in this mini-keynote.
John Goodacre, ARM, UK
The Effect and Technique of System Coherence in ARM Multicore Technology
Having successfully deployed the first generation of ARM MPCore multicore technology, focus moved onto enhancing and improving the integration and operation of the processor within the broader SoC in addition to the traditional processor microarchitectural enhancements. A key new feature is the ability to coherently integrate into the cache hierarchy of the processor system DMA and accelerators. A feature that greatly simplifies software while also improving the performance and power efficiency of the next generation of ARM based multicore SoC.
Steven P. Levitan, University of Pittsburgh, USA
Zooming out and Scaling up: from 1 to 1000 Cores
The quantitative changes in the scale of MPSoC’s over the next decade must be reflected in commensurate qualitative changes in programming metaphors that we use to develop our applications. We must go beyond the use of threads for small scale parallelism, and process migration for medium scale systems to truly concurrent implementations of computationally intensive applications.
Therefore we need to develop new high level computational metaphors vs. lower level programming paradigms. Parallel programming paradigms include the more traditional techniques of decomposing tasks by function, space, time, and instance. Computational metaphors, on the other hand, reflect the need to work in the problem domain with constructs that directly support computational intensive applications including interpretation, transformation, simulation, and optimization.
Kazutoshi Wakabayashi, NEC, Japan
Practical Usage of "C-based Synthesis" for Dynamic Reconfigurable Chips, and Area Reduction by Performance Balancing
This talk illustrates how we are daily using C-based "Synthesis" and verification for commercial SoC design. We explain what kind of difficulties designers often encounter when they begin to use it, and how we advice them to solve them (e.g. pipelining, controller, I/O timing, QoR, C modification for better hardware). We also discuss that C-base verification tools connected to behavior synthesis are essential for practical use of C-based synthesis.
Pieter van der Wolf, NXP, Netherlands
Performance Contracts for Modular MPSoC Integration
The current practice of IP-based SoC integration requires significant efforts from a SoC integrator to understand the communication requirements of the IP modules and to satisfy these with a SoC infrastructure. We propose a SoC integration style where the communication requirements of IP modules are explicitly characterized and expressed as performance contracts. The performance contracts specify traffic bounds and associated latency constraints that need to be satisfied. The proposed SoC integration style decouples the IP development from the integration of a set of IPs in an MPSoC. IPs can be developed to meet an agreed performance contract, while the SoC infrastructure can be developed to meet the set of performance contracts for the set of IPs to be integrated. Explicit (re-)negotiation of performance contracts supports structured communication among IP developers and SoC integrators.

In-Depth Presentation:

David Atienza, LSI, EPFL, Switzerland
New Exploration Frameworks for Temperature-Aware Design of MPSoCs
Multi-Processor Systems-on-Chip (MPSoCs) are increasingly penetrating the consumer electronics market as a powerful, yet commercially viable, solution to answer the strong and steadily growing demand for scalable and high performance systems, at limited design complexity. Nevertheless, MPSoCs are prone to alarming temperature variations on the die, which seriously decrease their expected reliability and lifetime. Thus, it is critical to develop new dedicated exploration frameworks for multi-core architectures that seamlessly address their thermal modeling and thermal management validation.
In this seminar, I present modelling and analysis tools for MPSoC architectures. In particular, I describe a novel thermal exploration framework based on a combined hardware-software emulation approach exploiting Field-Programmable Gate Arrays (FPGA), which enables the accurate characterization of the thermal behavior of MPSoCs, while being three orders of magnitude faster than state-of-the-art architectural and system simulators. The proposed modeling methodology combines hardware emulation with a new software-based thermal estimation tool for embedded systems running in a host computer. This estimation tool is able to predict the run-time temperature of SoC components based on statistics of the power dissipated in each component during execution, which are extracted transparently from the emulated MPSoC. Moreover, the use of this thermal estimation tool in combination with FPGA emulation enables a novel closed-loop modeling methodology that is able to validate run-time thermal management strategies for actual MPSoCs with a very limited effort, and without post-simulation analyses. As a result, in the last part of the seminar, I will illustrate how this novel MPSoC modeling framework can be applied to validate and tune thermal management techniques for MPSoCs, which exploit the existing dynamic frequency and voltage scaling support, and can be implemented at the Multi-Processor Operating System level to achieve optimal system performance under thermal constraints.
Stefan Heinen, RWTH Aachen University, Germany
Challenges for SoC Integration of Multistandard-RF-Circuits
Over the last decade the RF integration into CMOS technologies has become mainstream. The production volumes required for cellular systems like GSM or short range systems like Bluetooth have justified the high R&D effort for SoC integration of RF subsystems. RF SoCs are state of the art today for Bluetooth, DECT, WLAN, UWB and GSM/EDGE. The next step of SoC has to cover cellular multi-standard SoC combining GSM/EDGE and UMTS/WCDMA in a single die, where LTE is almost on the horizon. The co-integration of high performance RF and digital is surely a challenge with respect to crosstalk and coupling. As a result elimination of spurious emissions are a major objective in SoC impacting R&D cost and time to market.
Cost reduction for multi-standard systems can only be achieved by reducing the complexity of the RF frontend with respect to external passive and active components. It will be shown that only smart high dynamic range transceivers are able to achieve the demanding RF performance goals required for real multi-standard operation. Moreover it will be shown that future RF SoC integration will require flexible reconfigurable frequency agile RF front end blocks to achieve the productivity steps required for further integration.
Christian Piguet, CSEM, Switzerland
Design Methodologies for Heterogeneous Systems and Case Studies of MPSoC Chips
The design of heterogeneous systems in very deep submicron technologies becomes a very complex task that has to bridge very high level system description to low level consideration due to technology defaults and variations. This talk will describe some of these low level main issues, such as dynamic and static power consumption, temperature, technology variations, interconnect, DFM, reliability and yield, and their impact on high-level design, such as the design of multi-Vdd, fault-tolerant, redundant or adaptive chip architectures. Some MPSoC chips will be presented in three domains in which heterogeneity is large: wireless sensor networks, vision sensors and mobile TV.

Mini-Keynotes:

Dominique Ragot, THALES, France
MPSoC Architecture Low-Power Optimisation under Linux
Power-efficiency is a very important criteria for autonomous embedded systems. With the latest generation of SoC it is now possible to have dynamic control over key parameters such as frequency, voltage through power management hardware units. This talk presents the main power-awareness capabilities accessible for MPSoC, introduces the reasons for using Linux in this area, and describes how applications can benefit from these capabilities.
Lothar Thiele, ETH Zürich, Switzerland
Closing the Loop: Exploration and Estimation
The talk describes an environment to map applications to MPSoC platforms. The corresponding platform enables the specification, simulation, performance evaluation and mapping of distributed algorithms. Major characteristics are scalability and multi-resolution methods for validation and estimation that combine simulation-based and analytic approaches.
Steve Leibson, Tensilica, USA
Green Computing: What Does it Mean for Embedded Silicon Systems?
Energy efficiency has long been a goal for embedded system design, but the need has been expressed in terms of long battery life or improved system cooling. As the electronics content of the world economy grows, the global impact of electronics inefficiency comes more starkly into focus. Both rising energy costs and concerns on CO2 emissions are changing the role of energy efficiency in all categories of products with particular emphasis on large-scale energy uses, such as telecommunications infrastructure, server farms and electrical infrastructure in residences and businesses. For many systems, the cost of energy to operate the system now exceeds the capital costs to build the system. This short talk assesses the big picture of electronics energy consumption and outlines the range of methods needed to attack the problem, with proposed changes in semiconductor processes, design tools, system architectures, software and operating modes.
Yankin Tanurhan, ACTEL, USA
Low Power Solutions in FPGA based Programmable Heterogeneous Systems

Thursday June 26

Keynote:

Wanda Gass, Texas Instruments, USA
Gaining Focus: Developing a “Tops-Down” Approach to Multicore Processor Architectures
Typical compute architectures are developed in a “bottoms up” fashion to maximize the processing performance. However, multicore processors should be optimized for specific applications. These processors should be developed with a “tops down” approach, delivering a targeted processor as well as software and tools to make the cores “transparent” to the customer.
This presentation will discuss the manageable set of problems facing the development of multicore processors, the need to develop a more functional architecture, and a guide to the tools needed to gain focus in a fragmented multicore market.

Mini-Keynotes:

Soo-Ik Chae, Seoul National University, Korea
A Multi-threading RISC Cluster for Video Systems
In implementing a software decoder, we need to capture its parallel model which contains many concurrent threads that are switched every several thousands cycles. Moreover, we need to employ processors to efficiently run this parallel model. Instead of a high-performance processor, we employed a cluster of simple processors. In this presentation, we first describe a simplified RISC processor and a cluster of the RISC processors that shares instruction and data caches. We substantially reduced its thread switching overhead by using a hardware kernel that supports context switching and synchronization.
Ahmed Jerraya, CEA-LETI, MINATEC, France
System Compilation for MPSoC based on NoC
ITRS predicts that all SoC will include many cores. Already several MPSoC based on NoC exist. However programming these MPSoC is still done manually. Automating this process is crucial for making these MPSoCs practical to use. This presentation shows the importance of NoC programming and introduces a new NoC programming flow.
Ulrich Ramacher, Infineon, Germany
R&D for X-Gold SDRxx Platform
What started in 2001 as a research project has led in 2007 to the BU COM SDR at Infineon. I shall present COM's roadmap for SDR and list the items of the SDR platform that need further research.
Norbert Wehn, University of Kaiserslautern, Germany
An Outer Modem ASIP for Software Defined Radio
Channel coding is key for reliable transmission in wireless communication systems and many advanced channel coding techniques (e.g. Convolutional codes, Turbo-Codes, LDPC codes) exist. There is a great demand for flexibility due to varying QoS requirements and multi-standard support. On the other hand wireless communication is a cost sensitive market segment in which area, energy and performance are key issues. We will present a family of dynamically reconfigurable application-specific instruction-set processors for advanced channel decoding algorithms in a software defined radio environment. As weakly programmable IP cores, they feature binary convolutional decoding, turbo decoding for binary as well as duo binary turbo codes and LDPC codes.
Rolf Ernst, Technical University, Braunschweig, Germany
Load Level Modeling
Current ESL methods and tools for verification focus on run-time efficient simulation. Simulation requires executable code and is, therefore, not applicable to early design phases where the software code is not yet available or is still subject to changes. In such cases, load models that are known from schedulability and network analysis can be used. Load models are compatible to the activation rules of application models, such as event driven data flow graphs or time driven Simulink models. The main limitation of classical schedulability analysis is its focus on worst case design. The presentation will outline an extension to the classical load model that captures task execution time and communication variations. That model can raise platform design to a next level of abstraction, thereby supporting a design process where application development is separated from software implementation. Using this model, analysis can also highlight sensitivity to software and application modifications.

In-Depth Presentation:

Kevin Smart, Synopsys, UK
Virtually Solving Debug Challenges of the MPSoC Era
The last decade has seen quite an evolution around MPSoCs and we are now at a stage in which users are growing beyond largely independent pieces of software running on largely independent “processor based silos”, with traditionally had little interaction in-between. The next step in the evolution of software on MPSoCs will be applications truly distributed between several processors resulting in a large amount of interaction between the different processor domains. As a result the software debug challenges become more difficult. The most common multi-processor debug nightmares - locks, data races, memory corruption and stalls – are potentially breaking traditional debug methods on FPGA based prototypes because of the inherent limitations of visibility and control.
This presentation will introduce the most common multi-processor debug challenges in more detail and introduce how to efficiently solve them by using virtual prototypes of MPSoCs, effectively debugging software on software. Virtual prototypes offer much increased visibility into the software running on an MPSoC and into the MPSoC platform itself. Once issues like locks and data races are identified, much superior control in virtual platforms allows efficiently steering the MPSoC platform execution to validate complex debug scenarios.
Tsuyoshi Isshiki, Tokyo Institute of Technology, Japan
MAPS-TCT: MPSoC Application Parallelization and Architecture Exploration Framework
eveloping applications that operate concurrently on a heterogeneous MPSoC platform presents one of the key challenges among the enormously complex MPSoC designs. MAPS-TCT is a new MPSoC application development framework utilizing the standard C programming on top of a powerful set of tools to aid the programmers to intelligently explore the vast MPSoC design space. MAPS (MPSoC Application Programming Studio) is an integrated tool set for parallelizing C applications, developed at RWTH Aachen, that is equipped with powerful application analysis and code partitioning engines, allowing the programmers to extract concurrency efficiently on sequential C applications. TCT (Tightly-Coupled Thread) framework, developed at Tokyo Institute of Technology, generates concurrent execution models of "tightly-coupled threads" for MPSoC functional blocks from "threaded" C programs, realizing a wide variety of parallelisms on any granularity. TCT compiler automatically inserts thread communication instructions and generates parallelized codes which is executed on its MPSoC platform.
Achim Nohl, CoWare, Germany
Debug and Test Techniques for Parallel Software using Virtual Platforms
Research on new programming models and methodologies for MPSoC software development is progressing, but still no mature and practical solution is in sight. But for now, engineers have to manually transform their large amount of sequential legacy software into parallel code. In order to take advantage of the parallelism, mature and proven software code written for single-core SoCs is now exposed to all the traps and pitfalls introduced by the parallel processing paradigm. With that, the famous and truthful statement “If you write the code as cleverly as possible, you are, by definition, not smart enough to debug it” (Brian W. Kernighan) takes on a whole new meaning. Therefore, powerful debugging solutions for multi-core software are a cornerstone for parallel software engineering in the industry.
However, the required system level visibility and synchronous debugging capabilities are expensive in terms of necessary on-chip hardware resources. In contrast to real hardware, Virtual Platform models provide the necessary visibility and control long before hardware is available. In this presentation we will elaborate on how Virtual platform enable engineers to trigger and analyze typical parallel code defects such dead-locks, race conditions or processor starvation. An outlook is given how Virtual Platforms will ease the testing of MPSoC software.

Mini-Keynotes:

Toshihiro Hattori, Renesas, Japan
Multi-Domain System Support Environment for Multi-Core System
One of the most practical MPSoc's are the integration of sub-systems.
The domains, which is software and hardware of subsystem, should be designed separately but integrate and communicate in the total system. I will discuss these multi-domain systems support environment.
Gabriela Nicolescu, Ecole Polytechnique de Montréal, Canada
Combining Memory Optimization with Mapping of Multimedia Applications for MPSoC
MPSoC platforms are now a reality and their efficiency in multimedia, signal processing, wireless or graphic applications has been already proven. However, mapping applications on these platforms is still a challenge. The mapping tools must respect the application and platform characteristics while optimizing a set of constraints (ex.response time, memory management, power, cost, load balancing, etc.) by exploring an important space of solutions. This presentation will focus on this aspect of MPSoC design and will emphasise the integration of memory optimization considerations into the mapping algorithms.
Frédéric Pétrot, TIMA - SLS, France
Automatic Timing Annotation of Native Software for MPSoC simulation
This talk presents a new approach to code annotation for performance estimation in native simulation. The approach is based on target code basic block analysis is fully automatic, and provides exact instruction count. It preserves the program structure and allow to perform source code debugging on the original code. This annotation strategy is a first step towards accurate timing estimation.
Eric Verhulst, OpenLicenseSociety, Belgium
No more Communication in your Program, the Key to Multi-core and Distributed Programming
While multi-core systems are a hot topic today, it is not a really new issue. Many embedded systems have used various types of multi-processors, embedded as on-chip CPU cores or at the board or system level, for decades.
However, the programming world, heavily biased by the von Neumann architecture, has never come up with a clean programming paradigm although process algebras like CSP were specifically developed to tackle the issue. In this presentation we will advocate that the key to programming such targets systems is to make the concurrency explicit, but to hide the communication in the system layer. The communication is still present, but at a more abstract level in the application by way of synchronization and communications services. This view lead to the development of the network-centric OpenComRTOS. It allows to program in an almost transparent way from multi-core single chips to widely distributed networks with just 5 KBytes of code.

Friday June 27

Keynote:

Gerhard Fettweis, TU Dresden, Germany
Challenges & Solutions for Building Future MPSoCs for Cellular Communications
In a mainstream embedded application area as SoCs for cellular communications baseband implementation we are currently finding an increasing amount of multi-processor system-on-chip solutions which are either being in the design phase or being presented to customers already. As we must prepare to go from multi-core to many-core, we must start finding solutions for many pressing issues, as e.g.
- Low power concept
- Hardware architecture concept
- Software architecture concept
- Programming paradigm
- Real-time scheduling
- Programmable processor and accelerator concept
- Hierarchy
- Network-on-chip architecture
- I/O architecture concept
In this talk a review of important best practice basic design principles of the past shall be given. Many still hold also within the new many-core paradigm. Giving many design examples it is shown how these principles have proven to be fruitful and still remain to be the essence for achieving low-power low-cost semiconductor solutions also for the MPSoC challenges of the future. New challenges of multi- and many-core circuits are addressed, and novel design solutions and design paradigms will be presented.

In-Depth Presentation:

Michael Speth, Infineon, Germany
Design & Verification Challenges for 3G/3.5G/4G Wireless Baseband MPSoCs
This talk discusses design and verification challenges of competitive wireless baseband MPSoCs. Both design and verification have to be highly efficient in terms of various and in part contradicting requirements like silicon area, current consumption, flexibility, time to market and verification coverage.
Traditionally the design of digital communications receivers is characterized by meeting the contradicting objectives of extremely high signal processing requirements and low power consumptions.
For 3G and 4G systems this task is even further aggravated. However, using canonical algorithm design in combination with innovative data-path architectures it is possible to meet these requirements with the standard design methodologies. The price for meeting these contradicting objectives is usually a limited flexibility of the system.
The true challenge in the receiver design 3G and 4G systems lies in the following characteristics:
- Extremely latency critical interactions between network, terminal, protocol stack and physical layer.
- An almost infinite range of operating modes, and their dynamic reconfiguration. As a result system-verification does not end in the lab but continues all the way through IOT testing and system bring-up in life networks. This asks for a high flexibility of the HW which contradicts the the requirements of the data-path and complicates HW-verification.
The key enablers to succeed in the design and bring-up of this type of systems are:
- An adequate system back-bone architecture orthogonalizing critical tasks and thus enabling a deterministic scheduling of system transients.
- A seamless methodology allowing system verification from pre-silicon until operator testing.
- A regression environment allowing profiling of the system using deliberately complex use-cases.
- Interdisciplinary teams that are involved from concept phase until final operator testing.
Rakefet Kol, Zoran Microelectronics, Israel
Must Provide a Solution on a Chip – from concept to delivery
In growing digital entertainment and digital imaging markets, OEM customers are seeking for greater capabilities within each product generation, reduced system costs, and shorter time to market. Zoran, as a leading provider of digital solutions, develops and delivers high-performance MPSoCs which contain pioneered digital audio, video, and signal processing technologies, imaging applications and Connect Share Entertain™ technologies for the digital world. Zoran-based DVD, digital camera, DTV, multimedia mobile phone, and multifunction printer products benefit from proficient integration and are addressing high volume price-sensitive markets.
The Vaddis® and VaddisHD™ product families of DVD multimedia processors are a fully digital end-to-end MPSoC solution for DVD players, addressing a wide range of applications from low-end $20 standalone DVD players through portable players, DVD receivers, and all the way to the state of the art high definition Blu-ray DVD players.
The Vaddis-9 product line reduces system cost by lowering system component count while also reducing system power consumption. The Vaddis-966L was optimized for portable players. This SoC was the first product in the world to include an integrated video processing and integrated LCD timing controller able to support various sizes and types of LCD panels for portable consumer electronics gadgets. It can support analog or digital video interfaces portable LCD panels, with different pixels arrangement and wide range of resolutions. The integrated LCD controller is flexible and programmable in order to support various LCD panels scaling, aspect ratio conversion, gamma correction, color manipulation, as well as brightness and contrast control.
The VaddisHD™ High Definition Multimedia Processor platform MPSoC integrates a state-of-the art High Definition (HD) video pipeline and enables multi-standard compressed video decoding including support for HD H.264, VC1, MPEG2, MPEG4, AVS, and other formats. With an integrated dual processor and an Audio DSP, it enables the most advanced High Definition applications, including Java support for the Blu-ray DVD blue laser DVD players. This IC family also incorporates Zoran's latest HDXtreme® video processing and scaling technology.
Jürgen Teich, University of Erlangen- Nuremberg, Germany
Reconfigurability Issues of Future Massively Parallel SoCs
Technology roadmaps foresee 1000 and more processors being integrated as an SoC in the year 2020. Obviously, the control of parallel running applications can not be fully centralized any more. Also, feature variations will become a problem if algorithms are not able to cope with such.
One way shown how be able to live with expected process variances is to exploit reconfigurability of processors, communication links and memories properly. The question comes only at what price this can and should be done and to what degree.
Another remedy will be necessary also on the application side: Is there a class of algorithms that is able to run on either 1,2, or N available processors without having to be recompiled? We will introduce the notion of invasive algorithms and invasive MPSOCs that allow an appplication to claim processors and other resources dynamically in a phase of invasion, to then execute in parallel, and later to retreat to single-processor execution or termination. We will show that invasive algorithms could provide the required reconfigurability on the application side.
An-Yeu Wu, Industrial Technology Research Institute, Taiwan
Reconfigurable Communication Silicon IP Design for Multi-spec/Multi-mode Communication Systems
The Industrial Technology Research Institute (ITRI) PAC (Parallel Architecture Core) project was initiated in 2003. The target is to develop a low-power and high-performance programmable platform for multimedia applications. In the first PAC project phase (2004~2006), a 5-way VLIW DSP (PACDSP) processor has been developed with our patented distributed & ping-pong register file and variable-length VLIW encoding techniques. A dual-core PAC SoC, which is composed of a PACDSP core and an ARM9 core, has also been designed and fabricated in TSMC 0.13µm technology to demonstrate its outstanding performance and energy efficiency for multimedia processing such as real-time H.264 codec. This talk summarizes the technical contents of the PACDSP, DVFS (dynamic voltage and frequency scaling)-enabled PAC SoC, and the energy-aware multimedia codec. The research directions of our second-phase PAC project (PAC II), including multicore architectures, ESL(electronics system-level) technology, and low-power multimedia framework, will also be addressed in this talk.