Modeling a Heterogeneous Multiprocessor for Software Defined Radio

Document Sample
Modeling a Heterogeneous Multiprocessor for Software Defined Radio Powered By Docstoc
					                                                       Modeling a Heterogeneous Multiprocessor for Software Defined Radio
                                                          Trevor Meyerowitz, Rong Chen*, Professor Alberto Sangiovanni-Vincentelli, UC Berkeley
                                                          Jens Harnisch, Infineon Corporation (*currently at Intel Corporation)

Motivation: Software Defined Radio                                                                                                                                                                                Key Point:
                                                                                                                                                                                                                                                                               Multi-tasked SIMD Core                      Our Modeling Platform                                                                                                                   etropolis
                                                                                                                                                                                                                                                                                                   PE Array
Platform Exploration                                                                                                                                                                                              Simulating this architecture
                                                                                                                                                                                                                  at a low level Is too slow.
                                                                                                                                                                                                                                                                              PE Array
                                                                                                                                                                                                                                                                                                PE PE    PE   PE
                                                                                                                                                                                                                                                                                                                              Key Features
                                                                                                                                                                                                                                                                                                                                                                                                                                                Meta model language

                                                                                                                                                                                                                                                                                                 0  1     2    3                     Orthogonalization of Concerns                                                              Metamodel
                                                                                                                                                                                                                                                                                                                                                                                                                                                      Front end
                                                                                                                                                                                                                                                                                                                                                                                        • Load designs

                                                                                                                                                                                                                                                                             I & D Cache                                                  Function – Architecture                      • Browse designs                                        Abstract syntax trees

                                                                                                                                                                                                               Multiple SIMD Cores (MuSIC)                                                       32 K Loc. MEM
                                                                                                                                                                                                                                                                                                                                                                                        • Relate designs
                                                                                                                                                                                                                                                                             16 K + 16 K                                                  Computation – Communication –                • Refine, map, etc.
                                                                                                                                                                                                                                                                                                                                                                                        • Invoke tools
                                                                                                                                                                                                                                                                                                                                                                                                                                 Back end1    Back end2   Back end3      ...   Back endN

                                                                                                                                                                                                                                                                                                                                           Coordination                                 • Analyze results

                                                                                                                                                                                                                                                                                                                                      Flexible and Formal Semantics
                                                                                                                                                                                                                                                                                                                                                                                                                                  Simulator   Synthesis   Verification             …

                                                                                                                                                                                                                                                                                                                                                                                                                                   tool        tool          tool                tool

                                                                                                                                                                                                                                         SIMD Core Cluster                                                                                Can Represent a Wide Number of                                   Metropolis Tool Framework
                                                                                                                                                                                                                                SIMD          SIMD           SIMD      SIMD          ARM/SC                                                Models of Computation
                                                                                                                                                                                                                                Core 1        Core 2         Core 3    Core 4         L1 Ctrl
                                                                                                                                                                                                                 RF                                                                              Bus
                                                                                                                                                                                                                                                                                                Bridge                               Constraints                                                                                   Functional Space

                                                                                                                                                                                                                                64 K           64 K          64 K        64 K          I&D                                                                                                                                       Function Instance

                                                                                                                                                                                                                                MEM            MEM           MEM         MEM          Cache                                               Temporal (LTL) and Quantitative (LOC)                                   Platform
                    2004          2005         2006         2007          2008         2009                                                                                                                                                                                                                                                                                                                        Mapping

                                                                                                                                                                                                                                                                                                                                     Quantities
                                                                                                                                                                                                                                                Multi-Layer System Bus                                                                    Arbitration and Numerical Annotation

   Processor approach is superior to dedicated hardware for cell phones with multiple standards

   But parallel and some dedicated hardware needed due to power con straints                                                                                                                                                                                                                                                        Mapping                                                                        Export

   What is the optimal architecture?
                                                                                                                                                                                                              Accel.         Shared Memory                                           Accel.     Peripherals
                                                                                                                                                                                                                                                                                                                                          Done via synchronizing events                                                         Platform Instance

   What is the optimal mapping to the architecture?                                                                                                                                                                                                                                 Turbo/                                                                                                                                        Architectural Space
                                                                                                                                                                                                              FIR Filter     96 K            96 K          96 K       96 K           Viterbi                                         Support For Backend Tool Development
   The mapping options are enormous and require high -level models with rapid feedback to reasonably explore.
                                                                                                                                                                                                                                                                                                                            Homepage:                 Platform-Based Design (ASV Triangles)

                                        MuSIC Architecture Netlist                                                                                                                                                                                                                                                 Functionality: 802.11 Receive Payload Processing
                                                                                                                                                                                           Skeleton Mapping Netlist
                                                                                                                                                                                                                                                                                                                                                   Processing                  Coder0          Spreader0                             Modulator0
                                                        Global Time
                                                                                                   SIMD0 RR
                                                                                                                                                                       Architecture Netlist                          Function Netlist
                                                                                                                                                                                                                             MAC                                                                                                                   Processing

                                                                                                                                                                                                                                                                                                                      Rx_Splitter                                                     PHY
                                                                                                                                                                                      ARM      aProc                                                                                                                                                 chain1                                                                          MAC
                                                                                                                                                                                                 1                                                                                                                                                                                   Merger

                                                                                                                                                                                                           Coder0          Spreader0          Modulator0

                                                                                                                  ARM RR                                                              SIMD     sProc       Coder4          Spreader4          Modulator4
                                                                                                                 Scheduler                                                              0        0                                                                                                                                                                                                                                    - Data
                                                                                                                                                                        Buses                              Coder1          Spreader1          Modulator1
                                                                                                                                                                                                                                                                                                                                                     chain5                                                                        - State
                                                                                                                                                                                      SIMD     sProc
                                                                                                                                                                                        1        1         Coder5          Spreader5          Modulator5
                                                                                                                                                                      Bus Slaves

                                                                                                                                                                                                           Coder2          Spreader2          Modulator2
                                                                                                                                                                                                                                                                                                                   •Processing chain depends on state of last processed data
                                                                                                                                                                                      SIMD     sProc

                                                                                                    SIMD3 RR
                                                                                                                                                                                        2        2                                                                                                                 •Is parallelizable because state is computed before data is finished
                                                                                                    Scheduler                                                                                              Coder3          Spreader3          Modulator3
                                                                                                                                                                                      SIMD     sProc
                                                                                                                                                                                        3        3          RxSplitter


Ongoing Work                                                                                                                        Levels of Platform Modeling                                        Proposed Refined SIMD and Local Memory
                                                                                                                                    1.   Abstract Mapped Metropolis Netlist                                                                                                                                             Simulation of Heterogeneous Multiprocessors is too slow at a low
                                                                                                                                                                                                                                                      Simulation of Cache, Local

                            Future Plans: Overview                                                                                  2.   Performance Back-Annotated Mapping Netlist                                    SIMD
                                                                                                                                                                                                                                                      Memory, and Load-Store                                             level of abstraction, and requires completed programs to evaluate
                                                                                                                                    3.   Performance Back-Annotated Mapping Netlist with
                                                                                                                                                                                                              μC              PE1        …            Controllers                                                        performance.
                                                                                                                                         Refined PE models
           Completion of High Level Model                                                                                                                                                                                                            PE’s driven by basic block                                       Higher level models are necessary for performance and design
                                                                                                                                    4.   Base Comparison Platform (SystemC Model)
               Add timing and arbitration to buses                                                                                                                                                                                      …            counts and iteration loops                                         space exploration. Should still be accurate through modeling key
               Add communication to the mapping                                                                                                                                                                                                      Modeling of PE Controller still                                   architectural features and performance back annotation.
                                                                                                                                     Key Elements for Performance Annotation                                                                          undecided                                                         Metropolis enables more rapid design space exploration by keepin g
           Performance-annotation from Cycle-Accurate Simulator of SIMD
            processor elements                                                                                               Main Question:                                                                                                                                                                             the functionality and architecture orthogonal and providing support
          1.    Manual Annotation                                                                                              Given that the architectural model is only concerned with timing                I$         D$                LM1    …                                                                    for mapping.
                                                                                                                                and other costs, how much of these costs can be statically
          2.    Trace-generation from SIMD simulator                                                                            encoded in the mapping?                                                                             SIMD Memory
           Create refined model of SIMD element and update trace                                                            Instruction Latencies of SIMD elements are highly
            capabilities                                                                                                       predictable due to their time-multiplexed nature
                                                                                                                                   Instruction counts for basic blocks should suffice                                                                                                                                     Wolfgang Raab, Matthias Richter, Alfonso Troya, and all of
               Annotate static pieces (e.g. SIMD instruction execution time)
                                                                                                                                   Inter-SIMD element communication may complicate this
               Simulate dynamic pieces (e.g. memory system)
                                                                                                                                                                                                                                                                                                                             the MuSIC team for all of their help with information about
                                                                                                                             Modeling of Memory and Bus Traffic is Key
                                                                                                                                                                                                                                                                                                                             the architecture and application.
                                                                                                                               Local Memory for SIMD elements
                                                                                                                               Caches for Control Processor                                                                                                                                                               Abhijit Davare, Qi Zhu, and Guang Yang for their help with
                                                                                                                                                                                                                                                                                                                             mapping and quantity managers in Metropolis.

                                                                                                                                                                Center for Hybrid and Embedded Software Systems

Shared By: