Document Sample
FPGA_Embedded Powered By Docstoc
					FPGA Embedded

Vijaykrishnan Narayanan
System Options
          Embedded System Design

              Hard Processor
                      • A processor built from
                        dedicated silicon is
                        referred to as a .hard.
                         – Such is the case for
                           the ARM922T. inside
                           the Altera Excalibur
                         – The PowerPC. 405
                           inside the Xilinx Virtex-
                           II Pro and Virtex-4
Source: Xilinx Inc.
               Soft Cores
• A .soft. processor is built using the FPGA’s
  general-purpose logic.
• The soft processor is typically described in
  a Hardware Description Language (HDL)
  or netlist.
• Unlike the hard processor, a soft
  processor must be synthesized and fit into
  the FPGA fabric.
• Xilinx MicroBlaze
FPGA Cores – Hard vs Soft
           Peripheral Logic
• In both soft and hard processor systems,
  the local memory, processor busses,
  internal peripherals, peripheral controllers,
  and memory controllers must be built from
  the FPGA’s general-purpose logic.
     Advantages of Embedded
       Processors in FPGA
• An FPGA embedded processor system
  offers many exceptional advantages
  compared to typical microprocessors
  1) customization
  2) obsolescence mitigation
  3) component and cost reduction
  4) hardware acceleration
            #1 Customization
• The designer of an FPGA embedded processor
  system has complete flexibility to select any
  combination of peripherals and controllers.
• In fact, the designer can invent new, unique
  peripherals that can be connected directly to the
  processor’s bus.
• If a designer has a non-standard requirement for
  a peripheral set, this can be met easily with an
  FPGA embedded processor system.
  – For example, a designer would not easily find an off-
    the-shelf processor with ten UARTs. However, in an
    FPGA, this configuration is very easily accomplished.
   #2 Obsolescence Mitigation
• Some companies, in particular those supporting
  military contracts, have a design requirement to
  ensure a product lifespan that is much longer
  than the lifespan of a standard electronics
  – Component obsolescence mitigation is a difficult
• FPGA soft-processors are an excellent solution
  in this case since the source HDL for the soft-
  processor can be purchased.
  – Ownership of the processor’s HDL code may fulfill the
    requirement for product lifespan guarantee.
       #3 Component and cost
• With the versatility of the FPGA, previous
  systems that required multiple components can
  be replaced with a single FPGA.
• Certainly this is the case when an auxiliary I/O
  chip or a co-processor is required next to an off-
  the-shelf processor.
• By reducing the component count in a design, a
  company can reduce board size and inventory
  management, both of which will save design
  time and cost.
      #4 Hardware acceleration
•   Perhaps the most compelling
    reason to choose an FPGA
    embedded processor is the ability
    to make tradeoffs between
    hardware and software to
    maximize efficiency and
•   If an algorithm is identified as a
    software bottleneck, a custom co-
    processing engine can be
    designed in the FPGA specifically
    for that algorithm.
•   This co-processor can be attached
    to the FPGA embedded processor
    through special, low-latency
    channels, and custom instructions
    can be defined to exercise the co-
• Unlike an off-the-shelf processor, the hardware platform
  for the FPGA embedded processor must be designed.
• The embedded designer becomes the hardware
  processor system designer when an FPGA solution is
• Because of the integration of the hardware and software
  platform design, the design tools are more complex.
• The increased tool complexity and design methodology
  requires more attention from the embedded designer.
• Since FPGA embedded processor software design is
  relatively new compared to software design for standard
  processors, the software design tools are likewise
  relatively immature, although workable.
      Peripherals and memory
• To facilitate FPGA embedded processor
  design, both Xilinx and Altera offer
  extensive libraries of intellectual property
  (IP) in the form of peripherals and memory
• This IP is included in the embedded
  processor toolsets provided by these
  manufacturers. (e.g. UART, DMA, PCI-X)
Altera Embedded Processors
Xilinx Embedded Processors
Types of Buses
Memory Interface
                Memory Usage
• The fastest possible memory option is to put everything
  in local memory.
• Xilinx local memory is made up of large FPGA memory
  blocks called BlockRAM (BRAM). Embedded processor
  accesses to BRAM happen in a single bus cycle.
• Since the processor and bus run at the same frequency
  in MicroBlaze, instructions stored in BRAM are executed
  at the full MicroBlaze processor frequency.
   – In a MicroBlaze system, BRAM is essentially equivalent in
     performance to a Level 1 (L1) cache.
• The PowerPC can run at frequencies greater than the
  bus and has true, built-in L1 cache.
   – Therefore, BRAM in a PowerPC system is equivalent in
     performance to a Level 2 (L2) cache.
                 BRAM Sizes
• Xilinx FPGA BRAM quantities differ by device.
• For example, the 1.5 million gate Spartan-3 device
  (XC3S1500) has a total capacity of 64KB, whereas the
  400,000 gate Spartan-3 device (XC3S400) has half as
  much at 32KB.
• An embedded designer using FPGAs should refer to the
  device family datasheet to review a specific chip’s BRAM
• If the designer’s program fits entirely within local
  memory, then the designer achieves optimal memory
• However, many embedded programs exceed this
     External Memory Interface
• Xilinx provides several memory controllers that interface
  with a variety of external memory devices.
   – memory controllers are connected to the processor peripheral
   – The three types of volatile memory supported by Xilinx are
      • SRAM
      • single-data-rate SDRAM
      • double-data-rate (DDR) SDRAM.
• SRAM controller is the smallest and simplest inside the
  FPGA, but SRAM is the most expensive of the three
  memory types.
• The DDR controller is the largest and most complex
  inside the FPGA, but fewer FPGA pins are required, and
  DDR is the least expensive per megabyte.
    Should you Cache external
            memory ?
• A design in Spartan-3 enables 8 KB of data
  cache and designates 32 MB of external
  memory to be cached.
  – This cache requires 12 address tag bits.
  – This configuration consumes 124 logic cells and 6
• Only 4 BlockRAMs are required in Spartan-3 to
  achieve 8 KB of local memory.
• In this case, cache is 50% more expensive in
  terms of BRAM usage than local memory.
  – The 2 extra BRAMs are used to store address tag
           Frequency Problems
• Additionally, the achievable system frequency may be
  reduced when the cache is enabled.
   – without any cache - 75 MHz;
   – with cache - 60 MHz.
• Cache controller
   – adds logic and complexity to the design,
   – decreasing the achieved system frequency during FPGA place
     and route.
• Consumes FPGA BRAM resources that may have
  otherwise been used to increase local memory
• Cache implementation may also cause the overall
  system frequency to decrease.
          Some example designs
• Considering these cautions, enabling the MicroBlaze cache,
  especially the instruction cache, may improve performance, even
  when the system must run at a lower frequency.
• A 60 MHz system with instruction cache enabled has a 150%
  advantage over a 75 MHz system without instruction cache (both
  systems store entire program in external memory).
• When both instruction and data caches are enabled, the 60 MHz
  outperforms the 75 MHz system by 308%.
    – This example is not the most practical since the entire DMIPs program
      will fit in the cache.
    – A more realistic experiment is to use an application that is larger than
      the cache.
• Another precaution is regarding applications that frequently jump
  beyond the size of the cache.
    – Multiple cache misses degrade the performance, sometimes making a
      cached external memory worse than the external memory without
             MicroBlaze Memory
• For MicroBlaze, perhaps the optimal memory configuration is to
  wisely partition the program code, maximizing the system frequency
  and local memory size.
• Critical data, instructions, and stack are placed in local memory.
• Data cache is not used, allowing for a larger local memory bank.
• If the local memory is not large enough to contain all instructions,
  the designer should consider enabling the instruction cache for the
  address range in external memory used for instructions.
• By not consuming BRAM in data cache, the local memory can be
  increased to contain more space.
• An instruction cache for the instructions assigned to external
  memory can be very effective.
• Experimentation or profiling shows which code items are most
  heavily accessed; assigning these items to local memory provides a
  greater performance improvement than caching.
          Peripheral Buses
• In addition to the memory access time, the
  peripheral bus also incurs some latency.
• In MicroBlaze, the memory controllers are
  attached to the On-chip Peripheral Bus
• For example, the OPB SDRAM controller
  requires a four to six cycle latency for a
  write and eight to ten cycle latency for a
  read (depending on bus clock frequency)
PowerPC Block
Common Acronyms
The PPC405x3 provides the following set of interfaces that support the
  attachment of cores and user logic:

-- Processor local bus interface

The processor local bus (PLB) interface provides a 32-bit address and
  three 64-bit data buses attached to the instruction-cache and data-
  cache units.

Two of the 64-bit buses are attached to the data-cache unit, one
  supporting read operations and the other supporting write

The third 64-bit bus is attached to the instruction-cache unit to support
instruction fetching.
Device control register interface
The device control register (DCR) bus
 interface supports the attachment of on-
 chip registers for device control.

Software can access these registers using
 the mfdcr and mtdcr instructions.
                   Other Interfaces
• The clock and power-management interface
    – supports several methods of clock distribution and power management.
• JTAG port interface
    – The JTAG port interface supports the attachment of external debug
    – Using the JTAG test-access port, a debug tool can single-step the
      processor and examine internal-processor state to facilitate software
• •On-chip interrupt controller
    – combines asynchronous interrupt inputs from on-chip and off-chip
      sources and presents them to the core using a pair of interrupt signals
      (critical and noncritical).
    – Asynchronous interrupt sources can include external signals, the JTAG
      and debug units, and any other on-chip peripherals.
• • On-chip memory controller interface
    – Supports attachment of additional memory to the instruction and data

Shared By: