Learning Center
Plans & pricing Sign in
Sign Out

Bit Processor with Viterbi Decoding


									16-Bit Viterbi Decoder Processor
Ashkan Borna, Mojtaba Mehrara, Robert Mullenix, and
                   Brian Pietras
 Features
 Viterbi Algorithm
 Viterbi Implementation
 Testability
 Conclusion
 16-Bit RISC Processor
 (3,7,5) Viterbi Decoding with 3-bit soft
  input data
 105MHz Clock Frequency
 16 Word “RAM”
 Implemented Instructions that use three
 Several new instructions to facilitate
  Viterbi decoding
Basic Concept
Convolutional Coding
 A class of error correcting codes which are widely used as
  channel coder in today’s digital communication systems
 The input is fed into a shift register and the outputs are the results
  of the modulo-2 addition of different registers’ outputs and the
  primary input.

                                +              +       X

                         D              D

                                               +       Y
Definition of terms I
   Constraint Length :
    Number of registers + 1
   Code Rate : 1/(Number of
    encoder outputs per input)
                                          +       +   X
   Generator polynomials:
    when read in binary,          u
                                      D       D
    correspond to the shift
    register connections to the                   +   Y
    XOR gates.
   For the above encoder
    constraint length is 3 and
    generator polynomials are
    7(111) and 5(101)
Hard vs. soft data
 If the received channel symbols are
  quantized to one-bit precision (<
  Vmax/2 = 0, > Vmax/2 = 1), the result is
  called hard-decision data
 If they are quantized to more than one
  bit the result is called soft-decision data
The Viterbi Algorithm
Trellis Diagram

                                 Trellis diagram for a (4,13,17) decoder

    Number of states = 2^(constraint length – 1)
    Each node corresponds to an individual state at a given time
     and indicates a possible pattern of recently received data bits
    Each branch indicates the transition to a new state at the next
     timing cycle
The Viterbi Algorithm
    Trellis Legend
       Defines the transition from each stage to the next

       Legend is constructed according to the structure of the encoder
                                                 00       0/00    00


        Trellis legend in our design             0/11

                                                 01                01


                                                 10                10

                                                  0/01            1/01

                                                 11       1/10     11

The Viterbi Algorithm
Branch Metric and Path Metric
   Branch Metrics: The "distance" between what we received and
    all of the possible channel symbol pairs we could have received
   Euclidean distance for soft input data =(B0 – x) ^ 2 + (B1 – y) ^ 2

   Our Simplification to the algorithm (Actually led to a slight BER
   Branch metric = B0 – x + B1 – y

   The path metrics is the accumulation the branch metrics through
    the maximum likelihood path arriving into a specific state
   Metric at each state = Min{(PMin0+BM0),(PMin1+BM1)}
The Viterbi Algorithm
 After building up the trellis, two approaches
  called „trace back‟ and „register exchange‟
  may be used to decode the data.
 Register exchange: A register is assigned to
  each state and it records the decoded output
  sequence along the path from the initial state
  to the final state. At the last stage, the
  decoded output sequence is the one stored in
  the survivor path register assigned to the
  state with the minimum path metric
The Viterbi Algorithm
 Trace back needs less computation
  than register exchange
 Register exchange is faster and
  requires less memory,
 we have chosen Register exchange for
  our decoder.
Results of Matlab Simulations for 1
million samples
Implementation of the algorithm
   New Instructions (Some of them use 3 registers):
       BMU Rdest: computes 4 four bit branch metrics and stores them in
       PMU Rsrc1,Rsrc2,Rdest: adds the proper branch metrics in Rsrc1
        to the previous path metrics in Rsrc2, compares them and puts the
        minimum in Rdest – Each PMU instruction computes two 8 bit path
       Swap1(2) Rsrc1,Rsrc2,Rdest : copies one of the computed path
        metrics of Rsrc1 and Rsrc2 and puts them in Rdest to prepare the
        path metric for next stage computation
       Hs Rsrc,Rdest (half shift) : shift the higher and lower 8 bits of Rsrc
        and copies that into Rdest. Used for normalization (division by two)
        of the path metrics
       Hcmp1(2,3) Rsrc, Rdest: compares the higher and lower 8 bit
        values in Rsrc and copies the minimum in Rdest - Used for finding
        the minimum metric at each stage to send out data from
        corresponding path register
Benefits of our implementation
   BMU does four additions

   PMU does three additions and two compares

   Half Shift does two shifts

   Swap does three moves

   Half compare does compare, move, (branch)

   Save about twenty Load/Store Instructions
Implementation (cont’d)
 Interface asserts reset and do_viterbi to
 Interface provides data on opx_serial
  and opy_serial, asserts data_in_valid
 Chip asserts send_data after each BMU
 Chip outputs data and asserts
 Controller stalls if it reaches BMU and
  BMU_ready signal is low
    Testing done through Scan Chains.

 Scan_In     PSR         IR        PC       Scan_Out

Scan Chains also allow us to bring data in and out
serially, through MOVI, LUI, and ADD instructions,
albeit extremely inefficiently
 Square root carry select ALU
 Funnel shifter
 6T muxes
 73.5 lambda bit slice width
 Core dimensions: 5100 x 4275 lambda
   Register Access: .674ns read delay
   ALU Operation: 2.415ns rise time
   Critical Instruction: Branch
       Has to be ready by Negative Edge of Clock
          2.45ns for ALU control and data signals
          2.42ns for ALU operation
          ~100ps to write into Next PC

   Minimum clock period = 9.5 ns
   Viterbi decoding throughput ~ 2.5 Mbps
   Device count ~ 20k
   Number of pins = 26
   Number of IOs = 12

To top