Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Introduction to Reconfigurable Computing Introduction to by gregoria

VIEWS: 81 PAGES: 10

									Introduction to Reconfigurable
          Computing




                                 4
Introduction to Reconfigurable Computing
l   Configurable Computing (CC) Attempts To
    Increase Performance And Silicon Utilization
    Efficiency Through Logic Recycling using
    FPGA and FPGA-like Devices
l   Hardware Algorithms Can Be “Paged” Into/Out
    Of CC Modules Much As Operating Systems
    Perform Software Paging
l   Factors Impacting the Performance
    Õ Logic Speed
    Õ Speed Of Reconfiguration
    Õ Flexibility Of Configuration

                                                   5
              Resource Utilization
l   Standard Microprocessor
    Õ Specialized Unit For Each Essential    Micro Code
                                                              Address Generation


      Task
                                                          Clock Gen.
    Õ Unit Functionality Fixed
    Õ Idle Units Lower Silicon Utilization        ALU


    Õ Basic Algorithms Fixed                                 Registers
                                                                             Cache
                                                                              and
                                                                              I/O
                                                  FPU


l   Reconfigurable Processor
    Õ Each Unit Specialized To Fit Task
    Õ Unit Functionality Alterable At Run
      Time
    Õ Idle Units Reconfigured For New
      Tasks
    Õ Basic Algorithms Can Be Tailored To
      Application
                                                                                     6
       FPGAs vs. DSPs
l   FPGAs can support multiple memory ports
l   FPGAs outperform DSPs:
    Õ Parallelism in the algorithm
    Õ Simple operations in a fixed sequence
    Õ FPGAs provide greater computational density using less
      power
    Õ Large data sets, low resolution (8 - 12 bits)
    Õ Simple control
l   DSPs outperform FPGAs
    Õ MAC operations
    Õ Complex arithmetic



                                                               7
Colt Integrated Circuit

             Colt Prototype
             HP 0.5um 3 Metal,
               PGA-132
               (MOSIS)
             16 FUs, XBar, DPs
             5.5mm x 6.1mm
             50 MHz
             Full-scale device:
               Stallion           8
    2nd Generation Processor--
          The Stallion

l   Successor of the Colt chip
l   Six data ports achieving basic pipelined data-
    flow control
l   Smart crossbar for the purpose of passing
    programming and data words to and from
    data-ports and meshes
l   Two IFU meshes and 4 multipliers
l   Ready for fabrication

                                                     9
       The Stallion Organization
                                           IFU MESH
                                          IFU MESH
                    Allocable Resources    (computational)
                                          (computational)

  Programmable
 Programmable
  Data Ports
 Data Ports

                                              Stream I/O




 “Smart” Crossbar
“Smart” Crossbar
 Network
Network
                                           Integer
                                          Integer
                                           Multipliers
                                          Multipliers
                                           (allocable)
                                          (allocable)    10
            Example Sub-Mesh Mapping
                     Port                                                Left
                     1                                                           Right

 1
             Y
Pass                                                                     Multiplier
                 Y is                                                   High Lo
      Valid if   valid                                                          w



                                                                                                  4x4 sub matrix of IFUs
                 ~0


  0        Load 0
           if F2=1


                                                                                                  Factorial computation
Pass       else
           load
           valid
           data




                                   Resul                   Y
                                                                         1
                                                                                   Y
                                                                                                  Demonstrates conditional
                         Dec                                       Pass
                                   t >=0
                               Valid if
                                                         Output
                                                         1 if
                                                         Y=0        if
                                                                      Select Y           Y is
                                                                                         valid
                                                                                                    execution capabilities
                                                                                                  Configured in < 30 usec
                                                            F1 Delay
                            Delay                  Delay        F1


                                       Y                   Y                       Y
                         Pass
                                                                     Delay
      Valid if             Select Y        Y is    Valid if          F1
                 F2=1                                          F2=
                         if                vali
                                                               1
           Delay                           d
           F2                                      Delay       F2



      Port                      Left               Port
      3                                            4                            Right
      Overflow                                      Result


                                       Factorial

                                                                                                                          11
           System Board Layout
                                      Features
                                      • Each slot
                                        contains a single
Crossbar     Slot   Slot   Crossbar     port
             Slot   Slot
                                      • Clusters
                                        connected using
                                        a module to
                                        bridge adjacent
                                        slots
                                      • Bridging
                                        extendible to
Crossbar                   Crossbar     other system
                                        boards
                                      • System is
                                        inherently
                                        scalable
                                                            12
Core Computing Component
l   XILINX FPGA (currently used in test-bed)
l   Problem: Pipeline processing fast but not readily modified with
    current ASIC design practice
l   Solution:
l   Colt chip (fabricated and tested)
     Õ 0.8 um HP CMOS process fabricated by MOSIS
     Õ Run time configurable
     Õ 50 MHz clock
l   Stallion chip (designed but not yet fabricated)
     Õ   0.5 um HP CMOS process
     Õ   64 functional units in mesh
     Õ   Dedicated multiplier
     Õ   Six data ports
     Õ   100 MHz clock


                                                                      13

								
To top