VHDL by yanyanliu123

VIEWS: 0 PAGES: 43

									 VHDL for FPGA
   --A Primer
Kolin Paul / Sanjay Rajopadhye
       Designing for FPGAs
• This is for synthesis --- so put UR
  synthesis hats on 
• U will build actual circuits .. Run them
  … time them … do exciting stuff 
• Frustrations --- yes … it is different
  from what programming in C is … and
  for those from DSD class … timing
  does play a real cool Part …..
                 The Model
•   Host-PCI card Model
     •   U build a coprocessor (addon accelerator)
•   Overview of the Annapolis System
       Our Typical System
 1M       1M   1M


      Virtex
      1000
       “1”
                                             Host


 1M
          1M   1M


                                        LAD Bus

1 Virtex 1000 FPGA, 6 Memories (6 MB)
               So VHDL
• Simulation
  – Defined in the LRM
  – IEEE Standard is universally accepted
• Synthesis
  – Specific to each vendor
  – And this is what goes onto the chip
    (ASIC/FPGA)
 Hardware Description Language
• Hardware --- not software
• Description --- structure
• Language – strong syntax and type
  declarations
• START with A BLOCK DIAGRAM
             VHDL Overview
• Programming Model
  – Essentially CSP

• Combinational/Sequential
     • Process … is it always sequential ?



• Understanding how VHDL is converted into
  hardware gives us an insight how the
  synthesized circuit will look like
                        Key Ideas
•   Entity
•   Architecture
•   Port
•   Process
•   Signal and types
•   Variable
•   Conditionals
•   Component and port map
•   Generate
•   Concurrency
•   Sequential
•   Sensitivity Lists
                                  An Example
Combinational
                                                             Sequential
Entity rsff is                                     Architecture sequential of rsff is
Port ( set,reset: IN Bit;                          Begin
     q,qb : INOUT Bit);                            Process(set,reset)
End rsff;                   Set                 Q  Begin
                                                   If set=‘1’ and reset = ‘0’ then
                                                   q<=‘0’ after 2ns;
Architecture netlist of rsff is                    qb <= ‘1’ after 4ns;
Component nand2                                    elseif set=‘0’ and reset = ‘1’ then
                                                Qb q<=‘1’ after 4ns;
port (a,b : in bit;           Reset
      c: out bit);                                 qb <= ‘0’ after 2ns;
End component                                      elseif set=‘0’ and reset = ‘0’ then
                                S     R   Q   Qb   q<=‘1’ after 2ns;
Begin
U1 : port map (set,qb,q);       1     0   0   1    qb <= ‘1’ after 2ns;
U2 : port map (reset,q,qb); 0         1   1   0    Endif;
                                                   End process;
End first;                      0     0   1   1
                                                   End first;
                              1       1   ?   ?
                       Synthesis Example
Entity 3add is
Port ( a,b,c: std_logic;
     z: out std_logic);
End 3add;
                                 a
Architecture first of 3add is    b         z
Begin                            c
z<= a and b and c;
End first;
Architecture second of 3add is
Begin
Process (a,b,c)
Variable temp:std_logic;         a
Begin                            b         z
temp := b and c;                 c
z <= a and var;
End;
End second;
                      Synthesis Example
 • A concurrent signal assignment that requires sequential
   hardware
Entity 3sel is
Port ( a,b,c: std_logic;
     z: out std_logic);
End 3sel;

Architecture cond of 3add is
Begin
z<= a when b=‘1’                               Latch
     else b when c=‘1’         a
else z;                                         D
End first;                                      C
                                                    Q        z
                               b
                                                R
                               c
         So The Application
• A systolic Multiplier to perform a
  Matrix Vector Multiply
• Simple application
• 3x3 matrix  3x1 vector
• Naïve code … to get all of us on board
    • The improvements are there for everyone to
      see and we shall solve them as class
      assignments
        So lets do the VHDL
• So what is the application ….
• Design steps
    • Please be modular in ur approach .. Makes
      debugging easier …
    • Also I prefer the bottom up approach in
      coding .. Top down while designing
• A simple uncomplicated design is the
  following
            The Algorithm –
• Systolic ….
     • U have an idea of what that means
• Introduced by Kung and Leiserson
• I hope U have noticed the feed
  patterns and when the outputs are
  available
• So the Block diagram of the
  architecture would look like …
      Well this could be a possible
            Implementation
• Sanjay’s Slide
         A Simple Implementation
                 Y = A*X


                       A
Driver




         X



             Y
                                   nil
               So my Components are
 •Shifter -- to feed in the data at the right time




•Systole -- essentially a multiply acc
                   So my Components are
•The systolic multiplier -- performs the actual MVM
           The example codes
• Shifter
• Systole
• Multiplier
       Ok … so this is the ckt
• And this is the compilation ….
• Open vsim
    • Make a project file … saves u a lot in typing
      …
    • Make scripts .. Make modules .. Make
      libraries … these are good coding practices …
                  Compile
• Compilation
    • Please remember to check the VHDL93
      option
    • U can manually edit the mpf and the compile
      scripts (which are typically .do files)
    • An introductory modelsim tutorial is in my
      home page (www.cs.colostate.edu/~kolin and
      then follow the Misc link)
            How do We Simulate …..
 •Force files … nah … too cumbersome … definitely not
 elegant
                    •A small snippet of a force file
                  Force File
                     force D 00101100             -- force 8-bit variable D to 00101100
                    force clk 0 0, 1 50 -r 100   -- means "force clk to 0 at 1ns, then to 1 at 50ns
                                                 -- and repeat every 100 ns.
                                                 -- This produces a clock with period of 100ns
                                                 -- and low time of 50ns.
•We will use a testbench to drive our architecture
        This is what a test bench looks like …
                Testbench
•This is behavioral VHDL …
               Simulation
• Simulation
    • A small observation … VHDL is not case
      sensitive .. Commands in modelsim are
 So my circuit simulates correctly
• Great the circuit simulates …
• So now how do I get to the FPGA …
• Key issues …
  – Remember the slide on where ur
    application will be wrt the board ….
  – And where will the data be ….
  – And clock …
          The FPGA board stuff
• So now we need to simulate the entire
  thing ….
• Remove the testbench … that just showed
  that our application design is correct ….
• We connect the appropiate inputs of the
  application to the VHDL model of the
  board …
     •   Remember this is the portion that will not be
         synthesized …it is already present … we need to
         simulate it because our app needs data and control
         signals from these hardwired portions of the board
      StarFire Board (Simplified)

 1M        1M   1M


       Virtex
       1000
        “1”
                                             Host


 1M
           1M   1M


                                        LAD Bus

1 Virtex 1000 FPGA, 6 Memories (6 MB)
           PE0
  PE1             PE2
  Right           Right
  Mezz            Mezz




          STUFF
Right
Left




  PE1             PE2
  Left            Left
  Mezz            Mezz

            LAD
Starfire Board
          Clocks – 4 of them!?
• K, M, P, U
  –   KClock   LAD Transactions (K?)
  –   MClock   Memory Transactions
  –   PClock   Processing Clock
  –   UClock   User Clock


• Okay, but why? What are they?
              KClock – LAD
• PE  Host
• 33MHz or 66MHz
  – 33MHz – Easy to Place and Route
  – 66MHz – 2X Host Bandwidth
  – Host and Chip must agree!!
     • Set in VHDL and Host Code
  – Clock is actually based on PCI Clock
     • Varies per host
     • Ours is approx. 33.23MHz / 66.46MHz
• Asynchronous to all other clocks
         MClock – Memory
• Speed of Memory IO
  – Both Local & Mezzanine
• User Selectable
  – 25MHz – 133MHz Wildstar
  – 25MHz – 100MHz Starfire
                Inside the Chips
                             Your
                          Application
Some        Mem
Memory      Mux
   .
   .        .
            .
   .
   .        .
            .       LAD-Mem       RegFile
   .        .                               Reset
                    Bridge
Some        Mem       .
Memory      Mux       .
                      .                             Clocks
                      .
                      .
                  LAD-Mem          LAD
Annapolis         Bridge           Mux
Provided

User
Provided
                                   LAD
               Register File
• Provides host access to 1-D array of
  32-bit registers
  – Size must be a power of 2
• Can be used for:
  –   Ready – “The host says I can go now”
  –   Done – “Hey Host, I am done!”
  –   Small 32-bit IO – “The answer is 42!”
  –   Run time parameters – “Threshold is 63”
                       ModelSim
• VHDL Simulation tool
• Annapolis provides
   – Host simulation components
   – VHDL Description of the WHOLE board
      •   LAD
      •   Memories (Local & Mezzanine)
      •   Busses
      •   Etc
• You provide
   – VHDL to run inside the chip
     (May contain Annapolis components as well)
• !Please Simulate .. Simulate … simulate …
                 The Files
• So where do we add our application
  component ….
    • The Top Level File
• And how do we simulate ….
    • Here are the steps ….
    • Please make changes where it is indicated ..
      Do not experiment without understanding
      what you are doing …..
            Simulation ….
• Now I suppose u realize why the Project
  file is so important ….
• Simulate ….
• Simulate until u are convinced that the
  application is behaving correctly
• The app_go and app_done signals are must
  ….
• The host and the board depend on them …..
                    So Synthesis
• Convinced that the simulation is Correct
• So now make changes in the makefile
     •   Makefile
• Type make ….
• See *.srr
          – Remove all warnings that are in your component (if u can
            understand what the warning means and can account for
            it in the VHDL … u can live with it … ELSE U MUST
            REMOVE ALL WARNINGS THAT U DON’T HAVE A
            CLUE ABOUT)
               Synplicity
• Synplicity Inc.
• Converts VHDL (or Verilog) into an
  EDIF
  – EDIF = description of your program in
    terms of virtex parts (4 input LUTs,
    FlipFlops, Ramblocks, Etc)
• Fast
  – 1-30 minutes
             Place and Route
•   Maps to lower level components
•   Lays them out
•   Routes between them
•   Slow
    – 10 minutes – 2 days
• Provides a bitstream (.bit file)
    – directly converted to .x86 for config
            Place and Route
• The makefile automatically does the place
  and route ….
• Check the *.par file to see that the timing
  constraints have been met ….
• Else check critical paths …
• Once satisfied, please use the estimated
  frequency in UR *.c file and recompile ….
• A *.bit file ought to be produced ……
             That’s about it
• Logon to the specified machine
• ftp ur code ….
       – The executable corresponding to the *.c file
       – The *.bit file
       – The *.dat files which contain ur data

• Run
• And enjoy ….
• Questions
• Tutorials Online on my Homepage
• Comments and suggestions
         – kolin@cs.colostate..edu
         – paulkolin@yahoo.com

• ....
          Issues to take care
• Data type
    • Use std_logic_vector (makes interface
      simpler)
    • Bit width --- (word size) … I will use 16 bits
• How to I/O ? Major Q .. Will see
  later
• And what is the circuit going to look
  like …

								
To top