Docstoc

Memory - Reconfigurable Computing VHDL

Document Sample
Memory - Reconfigurable Computing VHDL Powered By Docstoc
					Reconfigurable Computing -
Memory in FPGAs


   John Morris
   Chung-Ang University
   The University of Auckland




                                                                                              1
                                ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia
Memory Needs
 Many applications require memory
    „Table-Driven‟ code
         Storage for tables
     Signal Processing
         Storage for coefficients
     Image Processing
         Storage for „windows‟ in images
     Text processing
         Storage for dictionaries
    …
 Unfortunately, memory tends to be a critical resource in FPGA
  implementations!




                                                                  2
Memory Organizations
 Random Access
                                                           add
    Includes SRAM, DRAM, SDRAM, etc
    All these are physical implementations                data
     of random access memory!                              R/~W
    Classified by number of ports
        Single
           • One reader or one writer at any time
        Dual
           • Two ports – simultaneous read or
             write                                       add      add
               • Conflicts can occur – software          data     data
                 usually responsible for ensuring that
                 operation is „safe‟                     R/~W     R/~W
           • Used for interfacing between two
             separate systems
               eg communication between two
                 processors                                         3
RAM (synchronous) Model
 LIBRARY ieee; USE ieee.std_logic_1164.ALL;
 USE work.app_types.ALL;

 ENTITY RAM_synch IS
          PORT( add : IN address; data : INOUT word;
                   read, clk, out_en : std_ulogic;)
 END ENTITY RAM_synch;

 ARCHITECTURE a OF RAM_synch IS
          TYPE data_array IS ARRAY) address’HIGH TO address’LOW ) OF word;
          SIGNAL mem: data_array;
          BEGIN
          PROCESS( clk )
                   BEGIN
                   IF clk'EVENT AND clk = '1' THEN
                            IF read = '0' THEN
                                     mem( add ) <= data;
                            ELSIF out_en = '1' THEN
                                     data <= mem( add;)
                            ELSE
                                     data <= highZ_word;
                                                                           4
                            END IF;
 Application Types package - extended
-- app_types.vhd
-- Package of types and constants for ... application

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;

PACKAGE app_types IS
  CONSTANT n_bits : POSITIVE := 8;
  SUBTYPE word IS std_logic_vector( 0 TO n_bits-1 );
  -- Note that
  -- (a) you can now use word'RANGE instead of the clumsier 0 TO n_bits-1
  --     eg FOR j IN word'RANGE LOOP ... END LOOP;
  --        FOR j IN word'RANGE GENERATE ... END GENERATE;
  -- (b) you also have word'LOW and word'HIGH for 0 and n_bits-1
  --     eg Special cases for the first and last bits
  --        IF (j = word'LOW) THEN ... END IF;
  --        IF (j = word'HIGH) THEN ... END IF;
  --     In all cases, word'HIGH is probably more readable than n_bits-1
  CONSTANT highZ_word : word := ( OTHERS => 'Z' );
  SUBTYPE address IS natural RANGE 0 TO 255;
END PACKAGE app_types;                                                      5
RAM (synchronous) Model
 ARCHITECTURE a OF RAM_synch IS
          TYPE data_array IS ARRAY) address’HIGH TO address’LOW ) OF word;
          SIGNAL mem: data_array;
          BEGIN
          PROCESS( clk )
                   BEGIN
                   IF clk'EVENT AND clk = '1' THEN
                            IF read = '0' THEN
                                     mem( add ) <= data;
                            ELSIF out_en = '1' THEN
                                     data <= mem( add );
                            ELSE
                                     data <= highZ_word;
                            END IF;
                   END IF;
          END PROCESS;
 END a;



                                                                             6
Memory Organizations
 Shift Registers
    Synchronous
    Stores n words
    For each word input, one word is output


     data                                          data
               Exactly n words stored internally


               clk




                                                          7
Shift Register Model
  LIBRARY ieee; USE ieee.std_logic_1164.ALL; USE work.app_types.ALL;
  ENTITY shift_register IS
           GENERIC( n : POSITIVE := 8 );
           PORT( data_in : IN word; data_out : OUT word;
                    clk : std_ulogic;)
  END ENTITY shift_register;

  ARCHITECTURE a OF shift_register IS
           TYPE data_array IS ARRAY( 0 TO n-1 ) OF word;
           SIGNAL mem: data_array;
           BEGIN
           PROCESS( clk )
                    BEGIN
                    IF clk'EVENT AND clk = '1' THEN
                             mem( mem'LOW ) <= data_in;
                             data_out <= mem( mem'HIGH );
                             FOR j IN 1 TO n-1 LOOP
                                       mem( j ) <= mem( j-1 );
                             END LOOP;
                    END IF;
           END PROCESS;
                                                                       8
  END a;
Memory Organizations
 First-In First-Out (FIFO)
                            data                                    data
    Stores n words                       Up to n words stored
    Independent R and W ports                  internally
    Used for matching data rates
    Variants                             write                  read
        Synchronous
        Asynchronous
    Need full and empty flags
        Also commonly provide „almost full‟, „almost empty‟ flags
        These allow a „busy‟ response to be sent to the provider (input)
         several cycles before the FIFO actually becomes full,
         eg we used it over the network in Achilles –
         the receiver end FIFO sends „busy‟ when it‟s almost full back
         through the net to the sender. This may take several cycles, but
         the sender can safely continue to send.
        Similarly „almost empty‟ can tell the provider to „wake up‟
                                                                        9
FIFO (synchronous) Model
LIBRARY ieee; USE ieee.std_logic_1164.ALL;
USE work.app_types.ALL;

ENTITY FIFO_synch IS
         GENERIC( n : POSITIVE := 8;)
         PORT( data_in : IN word; data_out : OUT word;
                  clk, rd, wr, reset : std_ulogic;
                  full, empty : OUT std_ulogic;)
END ENTITY FIFO_synch;

ARCHITECTURE a OF FIFO_synch IS
         SUBTYPE index IS natural RANGE 0 TO n-1;
         TYPE data_array IS ARRAY( index'LOW TO index'HIGH ) OF word;
         SIGNAL mem: data_array;
         SIGNAL read_ix, write_ix : index;
         BEGIN
         PROCESS( clk)
                  VARIABLE r_ix, w_ix : natural := 0 ;
                  BEGIN
                  IF reset = '1' THEN
                           read_ix <= 0;
                                                                        10
                           write_ix <= 0;
FIFO (synchronous) Model - Architecture
ARCHITECTURE a OF FIFO_synch IS
         SUBTYPE index IS natural RANGE 0 TO n-1;
         TYPE data_array IS ARRAY( index'LOW TO index'HIGH ) OF word;
         SIGNAL mem: data_array;
         SIGNAL read_ix, write_ix : index;
         BEGIN
         PROCESS( clk)
                  VARIABLE r_ix, w_ix : natural := 0 ;
                  BEGIN
                  IF reset = '1' THEN
                           read_ix <= 0;
                           write_ix <= 0;
                  ELSIF clk'EVENT AND clk = '1' THEN
                           IF rd = '1' THEN
                                    IF ( read_ix /= write_ix ) THEN
                                             data_out <= mem( read_ix;)
                                             empty <= '0;'
                                             read_ix <= read_ix + 1;
                                    END IF;
                           END IF;
                           IF wr = '1' THEN
                                                                          11
                                    IF ( read_ix /= write_ix ) THEN
FIFO (synchronous) Model - Architecture
ARCHITECTURE a OF FIFO_synch IS
         SUBTYPE index IS natural RANGE 0 TO n-1;
         TYPE data_array IS ARRAY( index'LOW TO index'HIGH ) OF word;
         SIGNAL mem: data_array; SIGNAL read_ix, write_ix : index;
         BEGIN
         PROCESS( clk)
                        VARIABLE r_ix, w_ix : natural := 0 ;
                        BEGIN
                        IF reset = '1' THEN read_ix <= 0; write_ix <= 0
                        ELSIF clk'EVENT AND clk = '1' THEN
                                       IF rd = '1' THEN
                                                       IF ( read_ix /= write_ix ) THEN
                                                                      data_out <= mem( read_ix;)
                                                                      empty <= '0;' read_ix <= read_ix + 1;
                                                       END IF;
                                       END IF;

                                       IF wr = '1' THEN
                                                IF ( read_ix /= write_ix ) THEN
                                                         data_out <= mem( read_ix;)
                                                         empty <= '0;'
                                                         write_ix <= write_ix - 1;
                                                END IF;
                                       END IF;
                                       IF ( read_ix = write_ix ) THEN empty <= '1;'
                                       END IF;
                  END IF;
         END PROCESS;
                                                                                                              12
END a;
FIFO (asynchronous) Model
LIBRARY ieee; USE ieee.std_logic_1164.ALL;
USE work.app_types.ALL;

ENTITY FIFO_asynch IS
         GENERIC( n : POSITIVE := 8 );
         PORT( data_in : IN word; data_out : OUT word;
                  rd, wr, reset : std_ulogic;
                  full, empty : OUT std_ulogic );
END ENTITY FIFO_asynch;

ARCHITECTURE a OF FIFO_asynch IS
         SUBTYPE index IS natural RANGE 0 TO n-1;
         TYPE data_array IS ARRAY( index'LOW TO index'HIGH ) OF word;
         SIGNAL mem: data_array;
         SIGNAL read_ix, write_ix : index;
         BEGIN
         PROCESS( rd, wr )
                  VARIABLE r_ix, w_ix : natural := 0;
                  BEGIN
                  IF reset = '1' THEN
                           read_ix <= 0;
                                                                        13
                           write_ix <= 0;
FIFO (asynchronous) Model
ARCHITECTURE a OF FIFO_asynch IS
         SUBTYPE index IS natural RANGE 0 TO n-1;
         TYPE data_array IS ARRAY( index'LOW TO index'HIGH ) OF word;
         SIGNAL mem: data_array;
         SIGNAL read_ix, write_ix : index;
         BEGIN
         PROCESS( rd, wr )
                  VARIABLE r_ix, w_ix : natural := 0;
                  BEGIN
                  IF reset = '1' THEN
                           read_ix <= 0;
                           write_ix <= 0;
                  ELSIF rd = '1' THEN
                           IF ( read_ix /= write_ix ) THEN
                                    data_out <= mem( read_ix );
                                    empty <= '0';
                                    read_ix <= read_ix + 1;
                           END IF;
                  END IF;
                  IF wr = '1' THEN
                           IF ( read_ix /= write_ix ) THEN
                                                                        14
                                    data_out <= mem( read_ix );
FIFO (asynchronous) Model
ARCHITECTURE a OF FIFO_asynch IS
               SUBTYPE index IS natural RANGE 0 TO n-1;
               TYPE data_array IS ARRAY( index'LOW TO index'HIGH ) OF word;
               SIGNAL mem: data_array;
               SIGNAL read_ix, write_ix : index;
               BEGIN
               PROCESS( rd, wr )
                              VARIABLE r_ix, w_ix : natural := 0;
                              BEGIN
                              IF reset = '1' THEN read_ix <= 0; write_ix <= 0;
                              ELSIF rd = '1' THEN
                                             IF ( read_ix /= write_ix ) THEN
                                                             data_out <= mem( read_ix );
                                                             empty <= '0';
                                                             read_ix <= read_ix + 1;
                                             END IF;
                              END IF;

                        IF wr = '1' THEN
                                 IF ( read_ix /= write_ix ) THEN
                                          data_out <= mem( read_ix );
                                          empty <= '0';
                                          write_ix <= write_ix - 1;
                                 END IF;
                        END IF;
                        IF ( read_ix = write_ix ) THEN
                                 empty <= '1';
                        END IF;
               END PROCESS;                                                                15
END a;
Memory Organizations
                                                          key
 Content Addressable memory
                                                       Up to n words
    „Dictionary‟ style applications
                                                      stored internally
        Does the memory contain a given word?
        Ordinary memory requires O(n) time
                                                        match
           • Each word is checked in turn                            search/
        Binary and tree searches can reduce this to O( log n )       ~add
    Input : a search „key‟ – one word of data
     Each location of memory is searched in parallel
        Indicates whether or not a match occurred in O( 1 ) time!
           • Returns
               Either
               • True / false = key found / not found
               or
               • Index of (one) match
               • Allows lookup of data in accompanying data table
        Expensive to implement
                                                                          16
           • Requires O(n) comparators for an n-word store
Content Addressable Memory - Model
 LIBRARY ieee; USE ieee.std_logic_1164.ALL; USE ieee.std_logic_arith.ALL;
 USE work.app_types.ALL;

 ENTITY CAM IS
          GENERIC( n : POSITIVE := 8 );
          PORT( key : IN word; found : OUT std_ulogic;
                   search : IN std_ulogic; full : OUT std_ulogic;
                   reset, clk : std_ulogic );
 END ENTITY CAM;

 ARCHITECTURE a OF CAM IS
          TYPE data_array IS ARRAY( 0 TO n-1 ) OF word;
          SIGNAL mem: data_array;
          SIGNAL top : natural;
          BEGIN
          PROCESS( clk, reset )
                   VARIABLE match : BOOLEAN;
                   BEGIN
                   IF reset = '1' THEN
                            top <= 0;
                            full <= ‘0’;                                17
                   ELSIF clk'EVENT AND clk = '1' THEN
Content Addressable Memory - Model
 ARCHITECTURE a OF CAM IS
          TYPE data_array IS ARRAY( 0 TO n-1 ) OF word;
          SIGNAL mem: data_array;
          SIGNAL top : natural;
          BEGIN
          PROCESS( clk, reset )
                   VARIABLE match : BOOLEAN;
                   BEGIN
                   IF reset = '1' THEN
                            top <= 0;
                            full <= ‘0’;
                   ELSIF clk'EVENT AND clk = '1' THEN
                            IF search = '1' THEN -- Search mode
                                     match := FALSE;
                                     FOR j IN data_array'RANGE LOOP
                                              match := key = mem( j );
                                     END LOOP;
                                     IF match THEN found <= '1’;
                                     ELSE          found <= '0’;
                                     END IF;
                            ELSE -- Add a new entry                      18
                                     IF top = data_array'HIGH THEN
Content Addressable Memory - Model
 ARCHITECTURE a OF CAM IS
          TYPE data_array IS ARRAY( 0 TO n-1 ) OF word;
          SIGNAL mem: data_array;
          SIGNAL top : natural;
          BEGIN
          PROCESS( clk, reset )
                         VARIABLE match : BOOLEAN;
                         BEGIN
                         IF reset = '1' THEN
                                        top <= 0; full <= ‘0’;
                         ELSIF clk'EVENT AND clk = '1' THEN
                                        IF search = '1' THEN -- Search mode
                                                        match := FALSE;
                                                        FOR j IN data_array'RANGE LOOP
                                                                        match := key = mem( j );
                                                        END LOOP;
                                                        IF match THEN found <= '1’;
                                                        ELSE          found <= '0’;
                                                        END IF;

                                        ELSE -- Add a new entry
                                                 IF top = data_array'HIGH THEN
                                                          full <= '1’;
                                                 ELSE
                                                          top <= top + 1;
                                                          mem( top ) <= key;
                                                 END IF;
                                        END IF;
                   END IF;
          END PROCESS;                                                                             19
 END a;
Early FPGAs                          Clearly, these capacities will
                                     often be too low for practical
                                     designs
 Memory provided by flip-flops of logic cells
                                     (They‟re bits not bytes too!)
    1-8 bits of memory / logic cell
           Type         ‘Gates’    Logic Cells   FFs       LUT
                                                           memory
Actel      AX2000       2x106      21504         21504     Logic
                        1.06x106                           only
Lattice    ispXGA1200   1.25x106   3844          30752     246K

QuickLogic QL6600       583008     4032          9105      Logic only
Altera     Stratix                 41250         41250
           EP1SGX40G
Xilinx     Virtex-II               125,136       250,272
           XC2VP125


 2003 data
                                                                        20
    Using largest variant from each manufacturer (?)
Memory Limitations
 Clearly these amounts are „small‟ for modern demands
   Example:
    Image processing
        A „low‟ resolution image
           • 1000x1000 = 106 pixels
           • 8 Mbits in B&W
           • 24 Mbits in colour



 All large FPGAs now provide „conventional‟ memory blocks

    Some examples …. 



                                                             21
Lattice Semiconductor
 ispXGA 1200
    1.25 x 106 „system‟ gates
    3844 PFUs (logic cells)
        8 Flip-flops/PFU
        30752 Logic flip-flops
    „Distributed‟ memory
        Uses memory of LUTs
        64 bits / PFU
        64 x 1 memory
        Organizations
           • Single port
           • Dual port
           • Shift register
               • 1-8 bits
    Total 246K bits
                                  22
Lattice Semiconductor
 ispXGA 1200
    Block Memory or „SysMEM‟ 414K bits
       90 Blocks of „regular‟ memory distributed throughout the device
       Each block (EBR)
       Dual port
          • 256 x 36
          • 1024 x 9
       Quad port
          • 512 x 18
          • 1024 x 18 (using 2 blocks)
       FIFO
          • 256 x 36
          • 512 x 18
          • 1024 x 9
       Content Addressable memory
                                                                     23
QuickLogic
 Eclipse QL6600
    583008 gates
        Small, but claims to be very fast!
        450MHz 32-bit counter
    72 x 56 = 4032 logic cells
        2 Flip-flops/cell
        9105 Logic flip-flops
    „Distributed‟ memory
        None
        Logic cell is logic!
         ie it doesn‟t use an LUT




                                              24
QuickLogic
 Eclipse QL6600
    Block Memory (Embedded SRAM) 82900 bits
       36 Blocks of „regular‟ memory distributed throughout the device
       Each block – 2304 bits
       Dual port
          • 128 x 18
          • 256 x 9
          • 512 x 4
          • 1024 x 2
       FIFO
          • Same?




                                                                     25
Actel
 AX2000
    2x106 equivalent system gates
    1.06 x 106 “typical” gates
        350+ MHz system performance
        500+ MHz internal performance
    Two types of logic cells
        10752 Register cells
        21504 Combinatorial cells
        21504 Flip-flops
    „Distributed‟ memory
        None
        Combinatorial cell is logic?
         ie it doesn‟t use an LUT


                                         26
Actel
 Actel AX2000
    Block Memory (Embedded SRAM) 294912 bits
       64 Blocks of „regular‟ memory distributed throughout the device
       Each block – 4608 bits
       Dual port
          • 128 x 36
          • 256 x 18
          • 512 x 9
          • 1024 x 4
          • 2048 x 2
       FIFO
          • Synchronous
          • Full, Empty
          • Almost full, Almost Empty

                                                                     27
Altera
 EP1SGX40D
    41,250 logic elements
        41,250 Flip-flops
    „Distributed‟ memory
        None?




                             28
Actel
 EP1SGX40D
    Block Memory (Embedded SRAM) 3423744bits
       3 block sizes („TriMatrix‟ memory)
       384 M512 blocks
          • 32 x 18 bits
       183 M4K blocks
          • 128 x 36
       4 M-RAM blocks
          • 4K x 144
       „True‟ dual-port memory
       FIFO
          • Synchronous
          • Full, Empty
          • Almost full, Almost Empty
    DSP blocks contain 18-bit shift registers
                                                 29
Xilinx
 XC2VP125
    112232 logic cells (or 55,616 CLBs)
        112232 Flip-flops
    „Distributed‟ memory (Distributed SelectRAM+)
        Each LUT
           • 16x1 synchronous RAM
        Within a CLB
           • Single port 16x8, 32x4, 64x2, 128x1
           • Dual port 16x4, 32x2, 64x1
        Total 1,779,712 bits




                                                     30
Xilinx
 XC2VP125
    Block Memory (Block SelectRAM+) 10,248,192 bits
        556 Blocks of „regular‟ memory distributed throughout the
         device
        Each block – 19K bits
        Single, Dual port
           • 16K x 1, 8K x 2, 4K x 4, 2K x 9, 1k x 18, 512 x 36
        FIFO
           • Synchronous
           • Full, Empty
           • Almost full, Almost Empty




                                                                     31
Using the RAM blocks from VHDL
 Use
    Manufacturer‟s library components
      eg Altera LPM library :
        lpm_fifo, lpm_shiftreg, lpm_ram_dq, lpm_rom


    Memory generator programs
       Alteras mega-function wizard
       Xilinx‟ CoreGen




                                                      32
Not enough embedded RAM?
 External Memory
    Large FPGAs have 200+ I/O pins
       All configurable as IN or OUT
       Interfacing with external devices is straightforward
          • Static RAM
          • FIFOs
          ‬ are particularly easy!
          • Some manufacturers provide „cores‟ (packaged solutions)
             for SDRAM, DDRAM, etc
    Only problem
       You don‟t have quite the same flexibility
          • Size
          • Data width
      ‬ when using an external memory chip(s)

                                                                      33

				
DOCUMENT INFO