Docstoc

FPGAs running parallelism

Document Sample
FPGAs running parallelism Powered By Docstoc
					   An Introduction to
Reconfigurable Computing

   Mitch Sukalski and Craig Ulmer
         Dean R&D Seminar
         11 December 2003
  Reconfigurable Computing…
is computation on a platform with reconfigurable
    (i.e., modifiable at run-time) hardware capable
    of implementing application-specific algorithms
    and functionality on demand.
                      Computing Spectrum
      Software                 Soft-Hardware                     Hardware
                                                         A       B C        D          π
           Fetch
           Decode                                            +                     x
          Registers
                                                                  x                    xor
           Execute
      x   /    +      xor                                         z-1          +

          Memory
                                                                        x
          Writeback
                                                                      result

    General-Purpose           Field Programmable          Application-Specific
         CPU                  Gate Arrays (FPGAs)       Integrated Circuit (ASIC)
• Easily reprogrammed       • Reconfigurable hardware   • Not modifiable
• Low cost                  • Medium cost               • High cost
• Fundamental bottlenecks   • Speedup potential         • Extremely fast
                                     History

                                   1945:     Eckert, Mauchly, von Neumann: ENIAC
                                   1945:     “von Neumann architecture”
                                   1960:     Estrin: Fixed+Variable Structure Computer
                                   1970’s:   Simple PLDs
                                   1985:     Xilinx introduces first FPGA
                                   1990’s:   Custom Computing Machines (CCMs)
                                   1999:     FPGAs exceed million logic gates
             ENIAC
    Fixed+Variable CPU:            2002:     FPGAs include complex cores
      Xilinx computational
ConnectingVirtex CCM:
    The Teramac II new
   Users can attachPro
 Blocks for anof FPGA
    Xilinx courtesy rapidio.org)
Multi-ChipVirtex algorithm
             Module of FPGAs
   computational circuits
      (image

         to a fixed ALU
     Reconfigurable Computing in
            Modern HPC
• Stand-alone platforms
   – OctigaBay 12K
   – SRC-6
   – Starbridge Hypercomputer

• Accelerator cards
   – Timelogic’s DeCypher
   – Nallatech’s BenNUEY
   – Annapolis Micro Systems
     WILDSTAR II
Example: Computational Fluid Dynamics
  William Smith & Austars Schnore at GE Global Research




     From: “Towards an RCC-based Accelerator for
            Computational Fluid Dynamics,” ERSA 2003
   And now for some details…
• Field Programmable Gate Arrays (FPGAs)
• Common RC design techniques
• Reported examples
 Field-Programmable Gate Arrays (FPGAs)

• FPGAs emulate digital logic circuitry
   – Large array of configurable logic blocks
   – Internal routing through programmable interconnection network

• FPGAs hold hardware configuration in SRAM
   – Change the digital circuitry by loading new configuration


• Design approach:
   – User designs in hardware description language
   – Synthesis tools translate to logic gates
   – Mapping tools target specific FPGA
      Simplified Logic Block
                    • Emulates logic function
LUT                    – Thousands per chip

         Register   • Lookup Table (LUT)
                       – Holds truth table
                       – Inputs produce outputs


         Register
                    • 1-bit registers
                       – Hold data between cycles

LUT                 • Note: Greatly simplified
LUT Example:1-bit Adder
        Truth Table
A   B       Cin   Cout    Sum

0   0       0         0    0

0   0       1         0    1

0   1       0         0    1

0   1       1         1    0
                                A
1   0       0         0    1    B
                                C   LUT              Cout
                                0
                                          Register
1   0       1         1    0

1   1       0         1    0              Register

                                A
                                B
1   1       1         1    1    C   LUT              Sum
                                0
 Routing Data between Logic Blocks

LB       LB       LB       LB       LB   • Need to connect logic blocks
     X        X        X        X

LB       LB       LB       LB       LB   • Wires and Switchboxes
     X        X        X        X           – LBs connect to local wires
LB       LB       LB       LB       LB      – Switchboxes route long
     X        X        X        X             connections
LB       LB       LB       LB       LB

     X        X        X        X
                                         • Routing set at compile time
LB       LB       LB       LB       LB
                                            – Performed by tools
                Reconfiguration
• Modern FPGAs SRAM based
   – Can be loaded with new circuitry   Full Configuration Image



• Full reconfiguration
   – Few megabytes of configuration
   – Milliseconds

                                                                   FPGA
• Partial reconfiguration
   – Reprogram only a portion of chip
   – Reduces configuration time
   – Non-trivial, poorly supported      Partial Configuration Image
 Design Techniques

Digital logic design techniques for
          exploiting FPGAs
FPGAs as Computational Accelerators

• Use FPGAs as soft-hardware
  – Port algorithm to hardware
  – Run inside FPGA
  – Reuse hardware

• Techniques
  – Concurrency, memory, partial evaluation
                 1. Concurrency
• Load FPGA with multiple computational circuits
   – Hardware state machines are like threads, but..
   – All tasks are always running


• Raw parallelism
   – Units run in parallel
   – Example: Key breaking


• Pipelining
   – Chain units together in series
   – Example: Streaming computations, data-flow
2. Custom Memory Interactions
• Most FPGA cards have multiple memory banks
  – Fetch/store multiple data values at same time
  – Predictable performance (as opposed to caches)
  – Hide address generation

              SRAM
              Bank 0
                              X
              SRAM
              Bank 1
                                                SRAM
                                       X        Bank 4
              SRAM
              Bank 2
                              X
              SRAM
              Bank 3           FPGA
            3. Partial Evaluation
• Know data constants at design time
  – Apply to circuits and reduce hardware
  – Synthesis tools perform automatically

     Example: 4-bit Ripple-Carry Adder




   Note: FPGAs unique because we can easily generate new, optimized
         hardware configurations for each set of constants.
   RC Performance Examples
• CFD: 23 GFLOPS sustained
  – “Towards an RCC-based Accelerator for
    Computational Fluid Dynamics,” Smith & Schnore,
    2003
• Adaptive beamforming: 20 GFLOPS
  – Parallel systolic array architecture
  – “20 GFLOPS QR processor on a Xilinx Virtex-E
    FPGA,” Walke, et. al., 2000
• Real-time holographic video display at 30fps
  – “Using field programmable gate arrays to scale up the
    speed of holographic video computation,” Nwodoh
                  In Summary
• Reconfigurable computing uses FPGAs to
  emulate application-specific hardware
   – Achieve performance gains with dedicated hardware

• It is possible to implement just about any kind of
  digital hardware in the FPGA.
   – Limited by capacity and effort
   – Resurrect application-specific hardware architectures
   – SIMD, MIMD, Systolic Processor Arrays, Data-Flow…

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:9
posted:3/30/2011
language:English
pages:19