The Issues by l10E4367

VIEWS: 0 PAGES: 41

									The Issues
 How much flexibility is needed and how best
  to include it…
 A single system description including
  interaction between the analog and digital
  domains
 “Realtime” SOC prototyping
 Automated ASIC design flow




                                                1
An SOC Design Flow with Prototyping

    Algorithm/flexibility        Initial System Description
        evaluation            (Floating point Matlab/Simulink)
                            Determine Flexibility Requirements
  Digital delay,
     area and                Description with Hardware Constraints
energy estimates
& effect of analog                   (Fixed point Simulink,
  impairments                      FSM Control in Stateflow)

                                  Common test vectors,
                               and hardware description of
                                   net list and modules

             Real-time Emulation           Automated AISC Generation
              (BEE FPGA Array)                 (Chip-in-a-Day flow)
                                                                     2
Simulation Framework using Simulink/Stateflow
(from Mathworks, Inc.)

                             Analog      Digital
     Transmitter   Channel
                             Receiver   Baseband




    • Techniques used to decrease simulation time:
       Baseband-equivalent modeling of RF blocks
       Compile design using MATLAB Real-Time
        Workshop
                                                     3
      Blocks map to implementation
      libraries    Black Box

                     2          D
                 TAP_COEF       A          Q
                                                                 RTL Code
                                WEN                                 or
Stateflow-             addr
                                    SRAM
                                                                 Synopsys
  VHDL                                         A
                                                                  Module
                       wen
translator                             1
                                       X
                                               B         Z   1
                                                             Y   Compiler
                reset_acc                      RESET

             CONTROL                               MAC
                                                                    or
                                                                  Custom
                        Time-Multiplexed FIR Filter               Module
  Implementation choices embedded in description
  Libraries of blocks are pre-verified and re-used
                                                                      4
    Timed Dataflow Graph Specification

                     1
                     A                +            1
                         2                                       1
                                      +            Z
                         B     MULT                              Z
                                S12   ADD         REG
                                      S18
 Simulink (from Mathworks)                 3
                                          RESET
 Discrete-Time                                            MUX
  (cycle accurate)                                     0
 Fixed-Point Types                               CONST
  (bit true)                                       S18

 No need for RTL simulation
 Embedded implementation       Multiply / Accumulate
  choices

                                                                     5
     Control
 Stateflow
    Extended Finite
     State Machine
    Subset of Syntax
    Converted to VHDL
    Synthesized
 VHDL
    Synthesized directly




    VHDL & Stateflow Macros map to a netlist of Standard Cells using
                         standard synthesis
                                                                       6
Simulink Model of Direct-Conversion Receiver




                                               7
Bit true, cycle accurate digital baseband
algorithms…




                                            8
Directly map diagram into hardware since there is a one
for one relationship for each of the blocks




                                     S reg   X reg   Add,    Mult2
                                                     Sub,
                                                     Shift
                                    Mac1     Mac2
                            Mult1

   Results: A fully parallel architecture that can be
    implemented rapidly
                                                                     9
Then do a simulation: Zero-IF Receiver
                                        • pre-MUD
  10 users (equal power)               • post-MUD
  13.5dB receiver NF
  PLL: -80dBc/Hz @ 100kHz
  2.5 I/Q phase mismatch
  82dB gain
  4% gain mismatch
  IIP2 = -11dBm
  IIP3 = -18dBm
  500kHz DC notch filter
  20MHz Butterworth LPF
  10-bit, 200MHz S-D ADC



                    Output SNR  15dB
                                                     10
With Analog Impairments
 • ideal receiver
 • real receiver
                       10 users (equal power)
                       20MHz Butterworth LPF
                       500kHz DC notch filter
                       13.5dB receiver NF
                       82dB gain
                       4% gain mismatch
                       2.5° I/Q phase mismatch
                       IIP2 = -11dBm
                       IIP3 = -18dBm
                       PLL: -80dBc/Hz @ 100kHz
                       10-bit, 200MHz S-D ADC


                                                  11
   Now to implement that description

    Algorithm/flexibility        Initial System Description
        evaluation            (Floating point Matlab/Simulink)
                            Determine Flexibility Requirements
  Digital delay,
     area and                Description with Hardware Constraints
energy estimates
& effect of analog                   (Fixed point Simulink,
  impairments                      FSM Control in Stateflow)

                                  Common test vectors,
                               and hardware description of
                                   net list and modules

              Real-time Emulation           Automated AISC Generation
               (BEE FPGA Array)                 (Chip-in-a-Day flow)
                                                                        12
     Berkeley Emulation Engine
             ―Complete Design & Prototype Environment
                              for Communication Systems




       Berkeley Wireless Research Center

Chen Chang, Kimmo Kuusilinna, Brian Richards, Kevin Camera,
      Nathan Chan, Allen Chan, Robert W. Brodersen
What’s BEE?
 A real-time FPGA-based hardware emulator, with
  speed up to 60 MHz
 Emulation capacity of 10 Million ASIC gate-
  equivalents per module, corresponding to 600 Gops
  (16-bit adds).
 2400 external parallel I/O providing 192 Gbps raw
  bandwidth.
 Automated design flow from Simulink to FPGA
  emulation, integrated with INSECTA ASIC design
  flow.




                                                      14
BEE Applications
 Real-time hardware emulation:
    Novel Communication Systems with analog front-end
     hardware (MCMA, UWB, 60GHz)
    Digital signal processing systems
    Real-time control systems
    Neuron-like network processing
 Hardware acceleration:
    Large communication/signal processing system
     simulation
    Hardware-in-the-loop cosimulation with software
     system
    Complex parallel computing algorithms



                                                         15
     The BEE Design Environment
Analog Front-end
                                                       Servers
                     BEE Processing Unit
                                                                            Client PC




                                           Ethernet              Network



       LVDS/LVTTL




                                                      BEE/Insecta          Simulink
                       FPGA
                    Bit Stream                          Design               MDL
                    & Conf File                          Flow


                      ASIC
                     Layout
                                                                                   16
      BEE System Assembly
                       20 Virtex-E 2000
Riser I/O Card
                       16 ZBT-SRAM (1MByte each)
                       8 Riser I/O Cards



                 MPB




                               StrongARM Module
                               Linux OS            17
                         Main Processing Board
                                                   Local Mesh


                         FPGA          FPGA

                                Xbar
Inter-board connectors




                         FPGA




                                                                       48 bit buses

                                                     Legend
                                                              Inter-board connectors

                                                              SRAM


                         96 bit inter-Xbar buses
                                                      FPGA       Xbar                  18
Hardware Performance
 Board-level Main Clock Rate: 160MHz+
 On Board connection speed:
      FPGA to FPGA: 100MHz
      XBAR to XBAR: 70MHz
 Off board connection speed:   (3 ft SCSI cable loop back
  through riser card)

      LVTTL: 40MHz
      LVDS: 160MHz ~ 220MHz


                                                             19
Hardware Capacity
 Reference Design:
       10240 tap FIR filter
       512 taps per FPGA
 Slice utilization:
    99% of 19200 slices
   Max Clock Rate: 28.5MHz
   ASIC Gate:
    401K per FPGA, 8M total
   MOPS: 583,680 total
    (16bit add & 12bit cmult)
   Power: 2.5W per FPGA,
    50W total

                                20
Design Flow Goals
 Fully automatic generation of FPGA and ASIC
  implementations from Simulink system level design
 Cycle accurate bit-true functional level equivalency
  between ASIC & BEE implementation
 Fast design turn-around time
    Chip-in-a-Day
    BEE-in-an-Hour




                                                         21
Design Flow: Global Perspective

                                     Simulink
                                    Schematics

          Virtual             BEE Compiler            Performance
        Components          (System Generator)         Estimation

                                VHDL Netlist



CORE, VHDL         FPGA                MC, VHDL
                                                      ASIC Flow
 Descriptions   Backend Flow           Descriptions

                Xilinx Bit stream                        GDSII
                                                                    22
Design Flow: Detailed View
                                 Design Specs
                                 & Test Vectors


                       Xilinx
                     Blockset       Simulink
                      Library


      Manual                         System
                      BEE                            Performance      Design Area,
      Partition                      Design
                    Partition?                        Estimation      Power, Speed
     Annotation                       MDL


                  BEE_ROUTER

                                  Xilinx System
                                   Generator


                                 BEE Post XSG
                                  Processes

                                                                           MC
     MAP/Timing
                   BEE_ISE                            INSECTA            Script
       Report
                                                                         Library


                   Chip-level        VHDL           ASIC Structural
                   BitStream     Simulation Files       Netlist



                                                    First Encounter
                  BEECONFIG        ModelSim
                                                    & Nano-Route




                                                     ASIC Layout                     23
                                   Nano-Sim
Virtual Component Library
 Parameterized system level                               Customizable block set library
   blocks                                                          Different Architecture
     Bit-width
                                                                   Different Technology Target
     Pipeline stages (latency)
     Output bits truncation
                                                    Virtual
                                                  Components


                                FPGA                                            ASIC
                            Implementation                                  Implementation



                                                                                Tech            Tech
                Tech          VirtexE         Virtex II         Tech
                                                                             Dependent       Dependent
             independent     Dependent       Dependent       independent
                                                                             Parameters      Parameters
             Parameters      Parameters      Parameters      Parameters
                                                                             for0.18um       for0.13um



            Synthesizable     Structural     Structural     Synthesizable     Structural     Structural
                VHDL            VHDL           VHDL             VHDL            VHDL           VHDL


                                                              Module
                                                                             HardMacro       HardMacro
                             FPGACore        FPGACore         Compiler
                                                                                Core            Core
                                                               Core


                                                                                                          24
      Basic Blocks

                                                                    FIFO     DPRAM


           Shifter    VHDL     Enable   Concat   Const

                                                                    ROM      RAM

Convert Counter       Delay    Down      Mux     P to S




Register   ReInt      S to P   Slice    Sync     Up Smp

                                                          Accum     AddSub    CMult     Inverter



                   FPGA+ASIC Support                      Logical    Mult     Negate    Relat’n

                   FPGA Support Only

                                                          Scale      Shift    Sin Cos   Thresh
                                                                                                   25
Communication & DSP Blocks


                                       Puncture




                      Conv. Encoder   Depuncture
   DDS
           CIC




   Shift   FIR



                 FPGA+ASIC Support

                 FPGA Support Only
                                                   26
 Control Logic Design
 Simulink level: StateFlow                     Generate VHDL
  diagram, encapsulated in a   VHDL   SF2VHD
                                                  for Black Box
  subsystem with Xilinx                         from StateFlow
  gateways
 RTL VHDL automatically
  generated by SF2VHD
 Fully integrated with the
  BEE_ISE tools




                                      StateFlow Controller   27
 Run-time Data I/O Interface

 New and improved infra-
  structure for transferring data              Matlab Control GUI
  to and from the BEE
      Control all data transfers from
                                                    Ethernet
       within a local Matlab GUI                                     BEE
      Accepts standard Simulink               Linux/StrongARM
       data structures for intrinsic               Daemon
       reuse of existing test vectors
      Library macro contains the




                                                                    RAM
                                         RAM
                                                  Embedded
       entire hardware interface in               Controller
       one fully parameterized block
                                                  User Design         28
Data I/O Interface: Hardware
    Pin Gateways   Bus Protocol   Source RAM
                    Controller                 Sink RAM




                                                  29
    Data I/O Interface: Software

                        Specify input source, BEE hostname, and
                           data bus parameters in Matlab GUI
                        Utilizes a custom MEX socket library for
                           network connectivity
                        Uses a simple packet header to distinguish
                           control frames and byte streams

   StrongARM (running embedded Linux)
       starts a persistent, lightweight server
                                                  root# ./daemon
 Matlab clients connect via TCP and either
         send a data stream or read request       Listening on port 2108: ok
                                                  Waiting for connection...
       Incoming data is translated into the
  hardware protocol and broadcast to FPGA                               30
ASIC Flow: INSECTA
   Tcl/Tk code drives the flow
      Same scripting language used by several EDA tools: First Encounter,
         Nanoroute, ModelSim, Synopsys
   GUI controls technology selection, parameter selection, flow sequencing
      A real “Push Button” flow…
      Users can refine flow-generated scripts




                                                                              31
ASIC Tool Flow: Placement
 Internally developed
  ASIC flow:
      First Encounter (FE)
      Nanoroute
      Physical Compiler
 Timing Driven!
    FE provides accurate
     wire parasitic
     estimates
    Placement by FE or
     Physical Compiler

                              32
ASIC Flow: Routing in 130nm
 Nanoroute: Ready for
  130nm, 90nm designs
    Stepped metal pitches
    Minimum area rules
    Complex VIA rules
    Avoids antenna rule
     violations
    Cross-talk avoidance: to
     be evaluated




                                33
ASIC Flow: Back-end
 Using Unicad backend
  directly for DRC, LVS,
  Antenna rule checking
     Easier to track
      technology updates
      from ST.
     Critical for evaluating
      internally developed
      technology files for FE,
      Nanoroute



                                 34
BCJR MAP Decoder

               E2PR4 Channel Encoder -
                 Decoder
               Fully enclosed design
                   Uniform RNG input
                   
                   vector
                  Channel encoder
                  AWGN filter
                  Channel decoder
                  BER collection
                   mechanism
               Part of: Full 3G Turbo
                Decoder


                                         35
 BCJR As Case Study




                                                       BER-SNR Waterfall Curve (BCJR)

                                     1


 13.2 MHz system clock
                                          -1   0   1    2   3   4   5    6   7     8   9   10   11   12   13   14

                                    0.1



 SNR 14db  -1db         BER      0.01




 10^9 Samples                    0.001




 20 minute run-time
                                 0.0001



                                0.00001
                                                                        SNR (dB)


                                                                                                                    36
   FPGA Implementation of a Narrow-Band
   Transmission System
  Transmission System                  Transmitter




Purpose         BEE Design Flow
                Development,
                Measurements for
                MCMA RF Front-End
                Specification          Receiver

Data Rate       1 Mbit/s, 500 Kbit/s

Carrier Freq.   2.45 GHz

Bandwidth       1 MHz

Modulation      DBPSK, OOK

Frame Synch.    PN Sequence
                                                     37
FPGA Implementation of a Narrow-Band
Transmission System




       Receiver                     Transmitter



                        BEE

                      Transmitter
                          Output
 Frame O.K.
                       Spectrum


 Data Match

                  Receiver
                  Output
                  on SCSI                         38
 Data Out
                  Connector
 2.4GHz Base-band Transmitter
CPU time: 57 min
Core Utilization: 0.344418 (Pad
limited)
Size (From SoC Enconter):
  Core Height 565.8u
  Core Width 489.54u

 Die Height 1322.66u
 Die Width 1242.3u

Synopsys estimates:
Total Dynamic Power = 610.5163
uW (100%)
Cell Leakage Power = 15.9364 uW
Critical path: 9.21ns




                                  39
How to get started?
 Documentation web site
    http://bwrc.eecs.berkeley.edu/Research/BEE

 Tutorials
    Lesson 1: Flow Basics
    Lesson 2: Runtime Debug on BEE
    Lesson 3: Control Logic Design
    Lesson 4: Run-time Data I/O on BEE




                                                  40
BEE Compiler Framework
 Increase Design Scalability                                    Design Specs &
                                                                  Test Vectors

    High-level blocks                              Hardware
                                                                     Matlab
                                                    Blockset
                                                                    Simulink
    Vector Signals                                  Library


 Reduce design time                                                                    Design Area/
                                                                 BEE_COMPILER           Speed/Power
    Faster run time                                                                      Gauge


    Efficient/partial synthesis
    Modular design reuse                           Chip-level       VHDL
                                                                                  Chip-level       Synopsis




                                   Timing Report
                                                   BitStream &     Simulation




                                    FPGA Area /
                                                                                   VHDL            MC Script
 Feature additions


                                     Post PAR
                                                    Conf File        Files


    Tri-state pads/signal                                                                                       MC

     support                                       BEECONFIG       ModelSim              INSECTA               Script
                                                                                                               Library

    Global pad assignment
                                                                                       ASIC Structural
    Automatic design                                                                      Netlist

     partition                                                                         ASIC Place &
                                                                                       Route Tools
    Script based hardware
     generator                                                                                           41

								
To top