Docstoc

Vhdl-Coding-For-FIR-Filter-Thesis

Document Sample
Vhdl-Coding-For-FIR-Filter-Thesis Powered By Docstoc
					   VHDL Coding
Exercise 4: FIR Filter
Where to start?
                            Designspace
  Feedback
                             Exploration



    Algorithm     Architecture



                            Optimization



      RTL-
                  VHDL-Code
  Block diagram
Algorithm
• High-Level System Diagram
   Context of the design
      Inputs and Outputs
      Throughput/rates
      Algorithmic requirements



• Algorithm Description
                                  y ( k ) = ∑ bi x( k − i )
                                             N
   Mathematical Description
   Performance Criteria
                                            i =0
                                   x( k )            y( k )
      Accuracy
      Optimization constraints
                                             FIR
   Implementation constraints
      Area
      Speed
Architecture (1)
• Isomorphic Architecture:
   Straight forward implementation of the algorithm

    x( k )

             b0   b1   b2       bN −2   bN −1   bN

                                                       y( k )
Architecture (2)
• Pipelining/Retiming:
   Improve timing

    x( k )

             b0   b1   b2          bN −2   bN −1   bN

                                                        y( k )
   Insert register(s) at the inputs or outputs
      Increases Latency
Architecture (2)
• Pipelining/Retiming:
   Improve timing

     x( k )

              b0   b1   b2            bN −2      bN −1     bN

                                                                y( k )
   Insert register(s) at the inputs or outputs
      Increases Latency
   Perform Retiming:                         Backwards:
      Move registers through the logic
       without changing functionality
                                              Forward:
Architecture (2)
• Pipelining/Retiming:
   Improve timing

    x( k )

             b0   b1   b2            bN −2      bN −1     bN

                                                               y( k )
   Insert register(s) at the inputs or outputs
      Increases Latency
   Perform Retiming:                        Backwards:
      Move registers through the logic
       without changing functionality
                                             Forward:
Architecture (2)
• Pipelining/Retiming:
   Improve timing

     x( k )

              b0   b1   b2            bN −2      bN −1     bN

                                                                y( k )
   Insert register(s) at the inputs or outputs
      Increases Latency
   Perform Retiming:                         Backwards:
      Move registers through the logic
       without changing functionality
                                              Forward:
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (3)
• Retiming and simple transformation:
   Optimization

     x( k )

              b0   b1   b2    bN −2   bN −1   bN

  y( k )
   Reverse the adder chain
   Perform Retiming
Architecture (4)
• More pipelining:
   Add one pipelining stage to the retimed circuit

      x( k )

               b0   b1   b2           bN −2   bN −1   bN

   y( k )
   The longest path is given by the multiplier
        Unbalanced: The delay from input to the first pipeline
         stage is
         much longer than the delay from the first to the second
         stage
Architecture (5)
• More pipelining:
   Add one pipelining stage to the retimed circuit

      x( k )

               b0   b1   b2          bN −2   bN −1    bN

   y( k )
   Move the pipeline registers into the multiplier:
        Paths between pipeline stages are balanced
        Improved timing
   Tclock = (Tadd + Tmult)/2 + Treg
Architecture (6)
• Iterative Decomposition:
   Reuse Hardware

    x( k )

             b0   b1   b2        bN −2   bN −1       bN

                                                          y( k )
   Identify regularity and reusable hardware components
   Add control
                          x( k )
      multiplexers
      storage elements
      Control
                                                 0
   Increases Cycles/Sample
                            b0                            y( k )
                            bN
RTL-Design
• Choose an architecture under the following constraints:
   It meets ALL timing specifications/constraints:
      Throughput
      Latency                                         Iterative
   It consumes the smallest possible area             Decomposition
   It requires the least possible amount of power


• Decide which additional functions are needed and
  how they can be implemented efficiently:
   Storage of samples x(k) => MEMORY
   Storage of coefficients bi => LUT
                                             x( k )
   Address generators for MEMORY and LUT
    => COUNTERS
   Control => FSM                                          0
                                                  b0                   y( k )
                                                  bN
RTL-Design
• RTL Block-diagram:
    Datapathy ( k ) = ∑ bi x( k − i )
                       N


                            i =0
      x( k )



                        0
           b0                       y( k )
           bN

• FSM:
    Interface protocols
     datapath control:
RTL-Design
• How it works: ( k ) = ∑ bi x( k − i )
                            N
              y
                           i =0
    IDLE
         Wait for new sample
RTL-Design
• How it works: ( k ) = ∑ bi x( k − i )
                             N
              y
                            i =0
    IDLE
         Wait for new sample
         Store to input register
RTL-Design
• How it works: ( k ) = ∑ bi x( k − i )
                             N
              y
                            i =0
    IDLE
         Wait for new sample
         Store to input register
    NEW DATA:
         Store new sample to memory
RTL-Design
• How it works: ( k ) = ∑ bi x( k − i )
                                  N
              y
                                 i =0
    IDLE
         Wait for new sample
         Store to input register
    NEW DATA:
         Store new sample to memory
    RUN:
            y ( k ) = ∑ bi x( k − i )
                      N
        
                     i =0
RTL-Design
• How it works: ( k ) = ∑ bi x( k − i )
                             N
              y
                            i =0
    IDLE
         Wait for new sample
         Store to input register
    NEW DATA:
         Store new sample to memory
    RUN:
            ( )         (          )
                   N
         y k = ∑ bi x k − i
                  i =0
         Store result to output register
RTL-Design
• How it works: ( k ) = ∑ bi x( k − i )
                               N
              y
                              i =0
    IDLE
         Wait for new sample
         Store to input register
    NEW DATA:
         Store new sample to memory
    RUN:
            ( )           (          )
                   N
         y k = ∑ bi x k − i
                  i =0
         Store result to output register
    DATA OUT:
         Output result
RTL-Design
• How it works: ( k ) = ∑ bi x( k − i )
                             N
              y
                            i =0
    IDLE
         Wait for new sample
         Store to input register
    NEW DATA:
         Store new sample to memory
    RUN:
            ( )         (          )
                   N

         y k = ∑ bi x k − i
                  i =0
         Store result to output register
    DATA OUT:
         Output result / Wait for ACK
RTL-Design
• How it works: ( k ) = ∑ bi x( k − i )
                             N
              y
                            i =0
    IDLE
         Wait for new sample
         Store to input register
    NEW DATA:
         Store new sample to memory
    RUN:
            ( )         (          )
                   N

         y k = ∑ bi x k − i
                  i =0
         Store result to output register
    DATA OUT:
         Output result / Wait for ACK
    IDLE: …
Translation into VHDL
• Some basic VHDL building blocks:
   Signal Assignments:
     Outside a process:
      AxD    YxD

                                                 • This is NOT allowed !!!
      AxD    YxD
      BxD


     Within a process (sequential execution):
      AxD                                        • Sequential execution
                   YxD                           • The last assignment is
      BxD
                                                   kept when the process
                                                   terminates
Translation into VHDL
• Some basic VHDL building blocks:
   Multiplexer:
    AxD
    BxD         YxD
    CxD                                Default
    SELxS                            Assignment
   Conditional Statements:
    AxD

    BxD

    SelAxS              OUTxD


    CxD

    DxD

    SelBxS


             STATExDP
Translation into VHDL
• Common mistakes with conditional statements:
   Example:
     AxD

     ??
                                      • NO default assignment
     SelAxS              OUTxD


     BxD

     ??                               • NO else statement

     SelBxS


              STATExDP




• ASSIGNING NOTHING TO A SIGNAL IS NOT A
  WAY TO KEEP ITS VALUE !!!!! => Use FlipFlops !!!
Translation into VHDL
• Some basic VHDL building blocks:
   Register:
    DataREGxDN         DataREGxDP




   Register with ENABLE:
          DataREGxDN         DataREGxDP




  DataREGxDN                 DataREGxDP
Translation into VHDL
• Common mistakes with sequential processes:
                                 DataREGxDN   DataREGxDP



                           CLKxCI


                       DataRegENxS
                                                 • Can not be translated
                                                   into hardware and is
                                                   NOT allowed


                                 DataREGxDN   DataREGxDP




                                     0
                                     1
                                                • Clocks are NEVER
                                                  generated within
                                                  any logic

                                 DataREGxDN   DataREGxDP



                           CLKxCI
                                               • Gated clocks are more
                                                 complicated then this
                                               • Avoid them !!!
                       DataRegENxS
Translation into VHDL
• Some basic rules:
   Sequential processes (FlipFlops)
      Only CLOCK and RESET in the sensitivity list
      Logic signals are NEVER used as clock signals
   Combinatorial processes
      Multiple assignments to the same signal are ONLY possible within the
       same process => ONLY the last assignment is valid
      Something must be assigned to each signal in any case OR
       There MUST be an ELSE for every IF statement
• More rules that help to avoid problems and surprises:
   Use separate signals for the PRESENT state and the
    NEXT state of every FlipFlop in your design.
   Use variables ONLY to store intermediate results or even
    avoid them whenever possible in an RTL design.
Translation into VHDL
• Write the ENTITY definition of your design to
  specify:
   Inputs, Outputs and Generics
Translation into VHDL
• Describe the functional units in your block
  diagram
  one after another in the architecture section:
Translation into VHDL
• Describe the functional units in your block
  diagram
  one after another in the architecture section:
Translation into VHDL
• Describe the functional units in your block
  diagram
  one after another in the architecture section:


              Register with ENABLE




              Register with ENABLE
Translation into VHDL
• Describe the functional units in your block
  diagram
  one after another in the architecture section:




               Register with CLEAR
Translation into VHDL
• Describe the functional units in your block
  diagram
  one after another in the architecture section:




                       Counter




                       Counter
Translation into VHDL
• Describe the functional units in your block
  diagram
  one after another in the architecture section:
Translation into VHDL
• The FSM is described with one sequential
  process
  and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential
  process
  and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential
  process
  and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential
  process
  and one combinatorial process
                                         MEALY
Translation into VHDL
• The FSM is described with one sequential
  process
  and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential
  process
  and one combinatorial process




                                             MEALY
Translation into VHDL
• The FSM is described with one sequential
  process
  and one combinatorial process




                                             MEALY
Translation into VHDL
• Complete and check the code:
   Declare the signals and components

   Check and complete the sensitivity lists of ALL
    combinatorial processes with ALL signals that are:
      used as condition in any IF or CASE statement
      being assigned to any other signal
      used in any operation with any other signal


   Check the sensitivity lists of ALL sequential processes
    that they
      contain ONLY one global clock and one global async. reset
       signal
      no other signals
Other Good Ideas
• Keep things simple
• Partition the design (Divide et Impera):
   Example:
    Start processing the next sample, while the previous
    result is waiting in the output register:
      Just add a FIFO to at the output of you filter
• Do NOT try to optimize each Gate or FlipFlop
• Do not try to save cycles if not necessary
• VHDL code
   Is usually long and that is good !!
   Is just a representation of your block diagram
   Does not mind hierarchy

				
DOCUMENT INFO
Shared By:
Stats:
views:66
posted:4/25/2012
language:English
pages:56