Docstoc

Introduction to Field Programmable Gate Arrays _FPGAs_

Document Sample
Introduction to Field Programmable Gate Arrays _FPGAs_ Powered By Docstoc
					   Introduction to Field
Programmable Gate Arrays
         (FPGAs)
              Bill Jason P. Tomas
Dept. of Electrical and Computer Engineering
       University of Nevada Las Vegas
Field Programmable Arrays
Dominant digital design implementation
Ability to re-configure FPGA to implement any
 digital logic function
   Partial re-configuration allows a portion of the FPGA
    to be continuously running while another portion is
    being re-configured
FPGAs also contain analog circuitry features
 including a programmable slew rate and drive
 strength, differential comparators on I/O
 designed to be connected to differential signaling
 channels.
Mixed-signal FPGAs contains ADCs and DACs
 with analog signal conditional blocks allowing
 them to operate as a system-on-chip (SoC)
FPGA Architectures
Early FPGAs
   N x N array of unit cells (CLB + routing)
    Special routing along center axis
Next Generation FPGAs
   M x N unit cells
   Small block RAMs around edges
More recent FPGAs
   Added block RAM arrays
   Added multiplier cores
   Adders processor cores
FPGA Architecture Trends
Memories
   Single & Dual-port RAMS
   FIFO (first-in first-out)
   ECC (error correcting codes)
Digital Signal Processors
   Multipliers
   Accumulators
   Arithmetic Logic Units (ALUs)
Embedded Processors
   Hardcore (dedicated processors)
     Dedicated program and data memories
     Programmable RAM in FPGA can be used in conjunction with
      the processor to provide program and data memories
   Soft core (synthesized from a HDL)
               Basic FPGA Architecture




•More recent FPGA architectures have small block RAM arrays (usually
placed in center column), multipliers, processor cores, DSP cores w/
multipliers, and I/O cells along columns for BGAs.
FPGA Operation
       User writes configuration memory
       which defines the function of the system.
       This includes: the connectivity between
       the CLBs and the I/O cells, the logic to
       be implemented onto the CLBs, and the
       I/O blocks.


       By changing the data in the
       configuration memory, the function of
       the system changes as well. This change
       in data can be implemented at anytime
       during FPGA operation (run-time
       configuration).
 Configurable Logic Blocks (CLBs)
 Architecture
 CLBs consist of:
   Look-up Tables (LUT) which implement the entries of a logic
    functions truth table
     Some FPGAs can use LUTs to implement small Random Access
      Memory (RAM)
   Carry and Control Logic
     Implements fast arithmetic operations (adders/ subtractors)
     Can be alsoconfigured for additional operations (Built-in-Self Test
      iterative-OR chain)
   Memory Elements
     Configurable Flip Flops (FFs)/ Latches( Programmable clock edges,
      set/reset, and clock enable)
     These memory elements usually can be configured as shift-
      registers
  Configurable Logic Blocks

A CLB can contain
several slices, which
make up a single CLB.
Xilinx Virtex-5 FPGAs
(right) have two slices:
SLICEL (logic) and
SLICEM (memory).

In addition to the basic
CLB architecture, the
Virtex-5 contains wide-
function MUXs which
can implement:
- 4:1 MUX using 1 LUT
- 8:1 MUX using 2 LUTs
- 16:1 MUX using 4
LUTs
   Look-up Tables (2:1 MUX Example)
Configuration memory holds output of truth
 table entries
Internal signals connect to control signals of
 MUXs to select a values of the truth tables for
 any given input signals
              LUT Based Ram
Normal LUT mode
 performs read
 operation
Address decoders
 with WE
 generates clock
 signals to latches
 for write operation
Smaller RAMs can
 be combined to
 create larger
 RAMs (up to 64-
 bit in Virtex-5)
                            FPGA Programmable
                           Interconnection Network
 Horizontal and vertical mesh of wire segments interconnected by
  programmable switches called programmable interconnect points (PIPs).
  These PIPs are implemented using a transmission gate controlled by a
  memory bits from the configuration memory.
 Consists of global routing connecting PLBs to I/O buffers, non-adjacent PLBs,
  and other embedded components. Local routing connects PLBs to other
  adjacent PLBs and PLBs to global routing (done through a switch matrix)




 Several types of PIPs are used
      Cross-point = connects vertical or horizontal wire segments allowing turns
      Breakpoint = connects or isolates 2 wire segments
      Decoded MUX = group of 2^n cross-points connected to a single output configure by n configuration
       bits
      Non-decoded MUX = n wire segments each with a configuration bit (n segments)
      Compound cross-point = 6 Break-point PIPS (can isolate two isolated signal nets)
Progammable Input/Output Cells
Bi-directional Buffers
     Programmable for inputs or outputs
     Tri-state controls bi-directional operation
     Pull-up/down resistors
     FFs/ Latches are used to improve timing issues
      Set-up and hold times
      Clock-to-out delay
Routing Resources
   Connections to core of array
Programmable I/O voltage and current levels
                                              Boundary Scan Access
FPGA Configuration Interfaces
Master (Serial or Parallel)
   FPGA retrieves configuration from ROM at initial
    power-up
Slave (Serial or Parallel)
   FPGA configured by an external source (i.e
    microprocessor/ other FPGA)
   Used for dynamic partial re-configuration
Boundary Scan
   4-wire IEEE standard serial interface used for
    testing
   Write and read access to configuration memory
   Interfaces to FPGA core internal routing network
Boundary Scan Configuration
                              Multi-FPGA Emulation Framework
                              to support NoC design and
                              verification (UNLV NSIL)

Developed to test
interconnect between
chips on PCB




                                     Daisy Chain
Test Access Point                    Configuration
(TAP) controller
composed of 16
state FSM
FPGA Configuration Techniques
Full configuration and readback
   Simple configuration interface
     Automatic internal calculation of frame address
   Larger FPGAs have a longer download time
Compressed configuration
   Requires multiple frame write capability
     Identical frames of configuration data are written to multiple
      frame addresses
   Extension of partial re-configuration interface capabilities
     Frame address is much smaller than frame of configuration
      data
   Reduces download time for initial configuration depending
    on regularity of system function and the array percent that
    is utilized
Partial re-configuration and readback
   Only change portions of configuration memory with respect
    to reference design
     Reduces download time for re-configuration
       Xilinx Virtex-5 FPGAs




Multi-FPGA-based emulation framework for NoC design and
verification (UNLV Networking and System Integration
Laboratory)
               Virtex-5 FPGA Platforms
                                  •Over 320,000 PLBs on the largest Virtex
                                  -5
Five Virtex-5 Platforms
1. LX- general logic              •ExpressFabric interconnect sturcture
   applications                   and 12 levels of metal interconnect
2. LXT- logic with advanced       allowing implementation of complex logic
   serial connectivity            functions allowing connections to
3. SXT-signal processing          neighboring PLBs in few hops than
   applications with advanced     Virtex-4
   serial connectivity
4. TXT- high performance          •Each PLB contains 8 LUTs, 8
   systems with double density    configurable memory elements (can be
   advanced serial connectivity   configured as RAM/ ROM/ shift register)
5. FXT- high performance
   embedded systems with          •Enhanced DSP functions on 25 x 18-bit
   advanced serial connectivity   multipliers (ability to be cascaded)

                                  •Clock managments contain one PLLC
                                  and two managers which can drive global
Virtex-5 CLB

               A single CLB in Virtex-5 consists of two slices:
               SLICEL (logic) and SLICEM (memory). Each
               CLB is connected to a switch matrix which can
               access to a general routing (global) matrix.


                            Every slice contains four
                            LUTS, wide function MUXs,
                            carry logic, and configurable
                            memory elements. SLICEM
                            support storing data using
                            distributed RAM and data
                            shifting with 32-bit shift
                            registers
SLICEL
SLICEM
FPGA Design Comparison Virtex-5,
Virtex-6, and spartan 6


                      Virtex-6 CLB have the same setup
                      as Virtex-5 (SLICEL & SLICEM)

                      Virtex-6 devices add four
                      additional storage elements which
                      can only be configured as edge-
                      triggered D-FFs. The D inputs are
                      driven by the output of the LUTs or
                      bypass slice inputs AX-DX
FPGA Design Comparison Virtex-5,
Virtex-6, and spartan 6



                      Spartan-6 CLB columns are
                      separated into two columns: 1
                      column for a new SLICEX and 1
                      column for alternating SLICEL
                      and SLICEM. SLICEX is a basic
                      CLB without any carry logic
                      added
Back to Virtex-5 CLB LUT
Up to 207, 360 LUTs (6-input) with greater than
 13 million configuration bits.
Can be configured as dual-output 5-input LUTs.
 In single 6-input LUT, O6 is the primary output.
                                      Inputs to LUT 2

               LUT 1          LUT 2




                                        Output
Inputs to LUT 1 &      Output           A5
Select Lines           MUX (A6)
LUT Schematic Simulation

                Logical AND




               Logical OR
Virtex-5 Programmable I/O

                                         The I/O cells in Virtex-5 have output logic
                                         blocks (OLOGIC) , input logic blocks (ILOGIC),
                                         I/O delays blocks, and a bidirectional I/O
                                         buffer.

                                         OLOGIC implements registers to improve
                                         system clock-to-output timing and supports
                                         single data-rate (SDR) and double data-rate
                                         (DDR) reception of data. It can also perform
                                         parallel-to-serial conversion of output data (2
                                         & 6 bits) in Serial/De-serializer (SerDes) mode.

                                         ILOGIC implements registers to improve set-
                                         up and hold times and support SDR and DDR
                                         transmission of data. It can perform serial-to-
Two I/O cells are grouped to form a      parallel conversion of input data(2 & 6 bits)
single I/O tile. In master/slave mode,
two I/O cells in the same I/O tile are
                                         when in SerDes mode.
connected via dedicated shift
routing to support larger data
widths.
Virtex-5 Programmable I/O
                            FPGA Programmable
                           Interconnection Network
 Horizontal and vertical mesh of wire segments interconnected by
  programmable switches called programmable interconnect points (PIPs).
  These PIPs are implemented using a transmission gate controlled by a
  memory bits from the configuration memory.
 Consists of global routing connecting PLBs to I/O buffers, non-adjacent PLBs,
  and other embedded components. Local routing connects PLBs to other
  adjacent PLBs and PLBs to global routing (done through a switch matrix)




 Several types of PIPs are used
      Cross-point = connects vertical or horizontal wire segments allowing turns
      Breakpoint = connects or isolates 2 wire segments
      Decoded MUX = group of 2^n cross-points connected to a single output configure by n configuration
       bits
      Non-decoded MUX = n wire segments each with a configuration bit (n segments)
      Compound cross-point = 6 Break-point PIPS (can isolate two isolated signal nets)
Virtex-5 FPGA Interconnection Network
Global routing consists of
   Long Lines= routing has three connections: beginning,
    middle, and end. Double lines have five connections into a
    switch matrix between beginning and end, and can source
    in all four directions of the FPGA from a switch matrix.
    Every direction has 10 BEGs, MIDs, and ENDs (all bi-
    directional) for a total of 240 wire segments per switch
    matrix. Spans 24 rows/columns of components with a
    switch matrix connection at every sixth component
   Double Lines= resources span three columns/rows of
    components, with a connection to the switch matrix for
    each component.
   Hex lines = three connections into a switch matrix similar
    to long lines. Source in all four directions from switch
    matrix. Spans six rows or columns of components
Virtex-5 FPGA Interconnection Network




  PIPs
Hands On Demonstration
 Future FPGA Developement

Moore’s law states
that the number of
transistors on a IC
circuit doubles
every two years.

How to continue
with the trend
stated by Moore??

3D Integrated
Circuitry
2D Integrated Circuit


                  Metal layer 6



                  Metal layer 3

                  Metal layer 2

                  Metal layer 1

              Active device layer

               Si Substrate
Transistors no longer dominate, metal
interconnections took over
Design Costs increase as technology gets smaller
IC Designs Decrease
FPGAs see diminishing benefits with
scaling
90% of FPGA logic area is programmable
 interconnect
Performance and power penalty are direct result
 of the area (70% Virtex-2)
Interconnect needs to increase faster than
 number of gates to keep up (Rents rule)
             10%
                                          Interconnect
       14%
                                          Logic
                                          Clocking
                                  60%
       16%                                IOB

       Dynamic Power in Virtex-2 (Shang
       FPGA’02)
Cross-talk increase as technology gets
smaller
3D Integrated Circuits
•More functionality in a smaller space  extends Moore’s Law
•More transistors in a package  larger designs
•Shorter Interconnects  less RC delays  better chip performance
•Power Decrease  shorter wires reduce power consumption by producing
less capacitance (also less inductance)
•Bandwith large number of vertial vias between layers allow construction
of wide bandwidth buses between functional blocks in different layers
3D Integrate Circuit




                  Metal layers

               Device layer 2

                  Metal layers

              Device layer 1

               Si Substrate
Young-Su KWON (MIT) 2005
  NuPGA® Architecture ( achieve same
  densities as an ASIC design?




Uses a graphite-based memory process for creating reprogrammable
memory elements, which is now being used as anti-fuses for 3D FPGAs.

Anti-fuses start as an open circuit, but can be reprogrammed to create a
low-resistance with a high voltage. Since the anti-fuses lay above the logic,
the interconnection density can rival ASICs.  The problem is that high
voltage programming transistors take up a lot of area negating the
density boost. NuPGA claims they have solved that problem by burying
the programmable transistors in a 3D foundation layer beneath the FPGA
Questions?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:4/11/2014
language:English
pages:44