# Description of the viterbi Algorithms

Document Sample

```					Description of the Algorithms (Part 2)

Performing Viterbi Decoding

The Viterbi decoder itself is the primary focus of this tutorial. Perhaps the single most
important concept to aid in understanding the Viterbi algorithm is the trellis diagram.
The figure below shows the trellis diagram for our example rate 1/2 K = 3
convolutional encoder, for a 15-bit message:

The four possible states of the encoder are depicted as four rows of horizontal dots.
There is one column of four dots for the initial state of the encoder and one for each
time instant during the message. For a 15-bit message with two encoder memory
flushing bits, there are 17 time instants in addition to t = 0, which represents the initial
condition of the encoder. The solid lines connecting dots in the diagram represent
state transitions when the input bit is a one. The dotted lines represent state transitions
when the input bit is a zero. Notice the correspondence between the arrows in the
trellis diagram and the state transition table discussed above. Also notice that since the
initial condition of the encoder is State 002, and the two memory flushing bits are
zeroes, the arrows start out at State 002 and end up at the same state.

The following diagram shows the states of the trellis that are actually reached during
the encoding of our example 15-bit message:

The encoder input bits and output symbols are shown at the bottom of the diagram.
Notice the correspondence between the encoder output symbols and the output

1
table discussed above. Let's look at that in more detail, using the expanded version of
the transition between one time instant to the next shown below:

The two-bit numbers labeling the lines are the corresponding convolutional encoder
channel symbol outputs. Remember that dotted lines represent cases where the
encoder input is a zero, and solid lines represent cases where the encoder input is a
one. (In the figure above, the two-bit binary numbers labeling dotted lines are on the
left, and the two-bit binary numbers labeling solid lines are on the right.)

OK, now let's start looking at how the Viterbi decoding algorithm actually works. For
our example, we're going to use hard-decision symbol inputs to keep things simple.
(The example source code uses soft-decision inputs to achieve better performance.)
Suppose we receive the above encoded message with a couple of bit errors:

Each time we receive a pair of channel symbols, we're going to compute a metric to
measure the "distance" between what we received and all of the possible channel
symbol pairs we could have received. Going from t = 0 to t = 1, there are only two
possible channel symbol pairs we could have received: 00 2, and 112. That's because
we know the convolutional encoder was initialized to the all-zeroes state, and given
one input bit = one or zero, there are only two states we could transition to and two

2
possible outputs of the encoder. These possible outputs of the encoder are 00 2 and
112.

The metric we're going to use for now is the Hamming distance between the received
channel symbol pair and the possible channel symbol pairs. The Hamming distance
is computed by simply counting how many bits are different between the
received channel symbol pair and the possible channel symbol pairs. The results
can only be zero, one, or two. The Hamming distance (or other metric) values we
compute at each time instant for the paths between the states at the previous time
instant and the states at the current time instant are called branch metrics. For the
first time instant, we're going to save these results as "accumulated error metric"
values, associated with states. For the second time instant on, the accumulated error
metrics will be computed by adding the previous accumulated error metrics to the
current branch metrics.

At t = 1, we received 002. The only possible channel symbol pairs we could have
received are 002 and 112. The Hamming distance between 002 and 002 is zero. The
Hamming distance between 002 and 112 is two. Therefore, the branch metric value for
the branch from State 002 to State 002 is zero, and for the branch from State 002 to
State 102 it's two. Since the previous accumulated error metric values are equal to
zero, the accumulated metric values for State 002 and for State 102are equal to the
branch metric values. The accumulated error metric values for the other two states are
undefined. The figure below illustrates the results at t = 1:

Note that the solid lines between states at t = 1 and the state at t = 0 illustrate the
predecessor-successor relationship between the states at t = 1 and the state at t = 0
respectively. This information is shown graphically in the figure, but is stored
numerically in the actual implementation. To be more specific, or maybe clear is a
better word, at each time instant t, we will store the number of the predecessor state
that led to each of the current states at t.

3
Now let's look what happens at t = 2. We received a 11 2 channel symbol pair. The
possible channel symbol pairs we could have received in going from t = 1 to t = 2 are
002 going from State 002 to State 002, 112 going from State 002 to State 102, 102 going
from State 102 to State 01 2, and 012 going from State 102 to State 11 2. The Hamming
distance between 002 and 112 is two, between 112 and 112 is zero, and between 10 2 or
012 and 112 is one. We add these branch metric values to the previous accumulated
error metric values associated with each state that we came from to get to the current
states. At t = 1, we could only be at State 00 2 or State 102. The accumulated error
metric values associated with those states were 0 and 2 respectively. The figure below
shows the calculation of the accumulated error metric associated with each state, at t =
2.

That's all the computation for t = 2. What we carry forward to t = 3 will be the
accumulated error metrics for each state, and the predecessor states for each of the
four states at t = 2, corresponding to the state relationships shown by the solid lines in
the illustration of the trellis.

Now look at the figure for t = 3. Things get a bit more complicated here, since there
are now two different ways that we could get from each of the four states that were
valid at t = 2 to the four states that are valid at t = 3. So how do we handle that? The
answer is, we compare the accumulated error metrics associated with each branch, and
discard the larger one of each pair of branches leading into a given state. If the
members of a pair of accumulated error metrics going into a particular state are equal,
we just save that value. The other thing that's affected is the predecessor-successor
history we're keeping. For each state, the predecessor that survives is the one with the
lower branch metric. If the two accumulated error metrics are equal, some people use
a fair coin toss to choose the surviving predecessor state. Others simply pick one of
them consistently, i.e. the upper branch or the lower branch. It probably doesn't matter
which method you use. The operation of adding the previous accumulated error
metrics to the new branch metrics, comparing the results, and selecting the smaller
(smallest) accumulated error metric to be retained for the next time instant is called

4
the add-compare-select operation. The figure below shows the results of processing t
= 3:

Note that the third channel symbol pair we received had a one-symbol error. The
smallest accumulated error metric is a one, and there are two of these.

Let's see what happens now at t = 4. The processing is the same as it was for t = 3.
The results are shown in the figure:

Notice that at t = 4, the path through the trellis of the actual transmitted message,
shown in bold, is again associated with the smallest accumulated error metric. Let's
look at t = 5:

5
At t = 5, the path through the trellis corresponding to the actual message, shown in
bold, is still associated with the smallest accumulated error metric. This is the thing
that the Viterbi decoder exploits to recover the original message.

Perhaps you're getting tired of stepping through the trellis. I know I am. Let's skip to
the end.

At t = 17, the trellis looks like this, with the clutter of the intermediate state history
removed:

The decoding process begins with building the accumulated error metric for some
number of received channel symbol pairs, and the history of what states preceded the
states at each time instant t with the smallest accumulated error metric. Once this
information is built up, the Viterbi decoder is ready to recreate the sequence of bits
that were input to the convolutional encoder when the message was encoded for
transmission. This is accomplished by the following steps:

   First, select the state having the smallest accumulated error metric and save the
state number of that state.
   Iteratively perform the following step until the beginning of the trellis is
reached: Working backward through the state history table, for the selected
6
state, select a new state which is listed in the state history table as being the
predecessor to that state. Save the state number of each selected state. This step
is called traceback.
   Now work forward through the list of selected states saved in the previous
steps. Look up what input bit corresponds to a transition from each predecessor
state to its successor state. That is the bit that must have been encoded by the
convolutional encoder.

The following table shows the accumulated metric for the full 15-bit (plus two
flushing     bits)     example     message      at      each      time       t:

t=          0    1    2    3    4   5    6    7       8   9   10   11   12   13   14   15   16   17

State            0    2    3    3   3    3    4       1   3   4    3    3    2    2    4    5    2
002

State                 3    1    2   2    3    1       4   4   1    4    2    3    4    4    2
012

State            2    0    2    1   3    3    4       3   1   4    1    4    3    3    2
102

State                 3    1    2   1    1    3       4   4   3    4    2    3    4    4
112

It is interesting to note that for this hard-decision-input Viterbi decoder example, the
smallest accumulated error metric in the final state indicates how many channel
symbol errors occurred.

The following state history table shows the surviving predecessor states for each state
at                        each                         time                          t:

t=          0    1    2    3    4   5    6    7       8   9   10   11   12   13   14   15   16   17

State       0    0    0    1    0   1    1    0       1   0   0    1    0    1    0    0    0    1
002

State       0    0    2    2    3   3    2    3       3   2   2    3    2    3    2    2    2    0
012

State       0    0    0    0    1   1    1    0       1   0   0    1    1    0    1    0    0    0
102

State       0    0    2    2    3   2    3    2       3   2   2    3    2    3    2    2    0    0
112

7
The following table shows the states selected when tracing the path back through the
survivor             state              table            shown               above:

t=            0   1       2       3       4       5           6       7       8       9       10        11        12        13        14        15        16   17

0   0       2       1       2       3           3       1       0       2       1         2         1         0         0         2         1    0

Using a table that maps state transitions to the inputs that caused them, we can now
recreate the original message. Here is what this table looks like for our example rate
1/2             K           =             3            convolutional            code:

Input was, Given Next State =

Current State               002 = 0         012 = 1         102 = 2           112 = 3

002 = 0                     0               x               1                 x

012 = 1                     0               x               1                 x

102 = 2                     x               0               x                 1

112 = 3                     x               0               x                 1

Note: In the above table, x denotes an impossible transition from one state to another
state.

So now we have all the tools required to recreate the original message from the

t=           1       2       3       4       5           6       7       8       9        10        11        12        13        14        15

0       1       0       1       1           1       0       0       1        0         1         0         0         0         1

The two flushing bits are discarded.

Here's an insight into how the traceback algorithm eventually finds its way onto the
right path even if it started out choosing the wrong initial state. This could happen if
more than one state had the smallest accumulated error metric, for example. I'll use
the figure for the trellis at t = 3 again to illustrate this point:

8
See how at t = 3, both States 012 and 112 had an accumulated error metric of 1. The
correct path goes to State 012 -notice that the bold line showing the actual message
path goes into this state. But suppose we choose State 112 to start our traceback. The
predecessor state for State 112 , which is State 102 , is the same as the predecessor
state for State 012! This is because at t = 2, State 102 had the smallest accumulated
error metric. So after a false start, we are almost immediately back on the correct path.

For the example 15-bit message, we built the trellis up for the entire message before
starting traceback. For longer messages, or continuous data, this is neither practical or
desirable, due to memory constraints and decoder delay. Research has shown that a
traceback depth of K x 5 is sufficient for Viterbi decoding with the type of codes we
have been discussing. Any deeper traceback increases decoding delay and decoder
memory requirements, while not significantly improving the performance of the
decoder. The exception is punctured codes, which I'll describe later. They require
deeper traceback to reach their final performance limits.

To implement a Viterbi decoder in software, the first step is to build some data
structures around which the decoder algorithm will be implemented. These data
structures are best implemented as arrays. The primary six arrays that we need for the
Viterbi decoder are as follows:

   A copy of the convolutional encoder next state table, the state transition table
of the encoder. The dimensions of this table (rows x columns) are 2 (K - 1) x 2k.
This array needs to be initialized before starting the decoding process.
   A copy of the convolutional encoder output table. The dimensions of this table
are 2(K - 1) x 2k. This array needs to be initialized before starting the decoding
process.
   An array (table) showing for each convolutional encoder current state and next
state, what input value (0 or 1) would produce the next state, given the current
state. We'll call this array the input table. Its dimensions are 2(K - 1) x 2(K - 1).
This array needs to be initialized before starting the decoding process.

9
   An array to store state predecessor history for each encoder state for up to K x 5
+ 1 received channel symbol pairs. We'll call this table the state historytable.
The dimensions of this array are 2 (K - 1) x (K x 5 + 1). This array does not need
to be initialized before starting the decoding process.
   An array to store the accumulated error metrics for each state computed using
the add-compare-select operation. This array will be called the accumulated
(K - 1)
error metric array. The dimensions of this array are 2            x 2. This array does
not need to be initialized before starting the decoding process.
   An array to store a list of states determined during traceback (term to be
explained below). It is called the state sequence array. The dimensions of this
array are (K x 5) + 1. This array does not need to be initialized before starting
the decoding process.

Before getting into the example source code, for purposes of completeness, I want to
talk briefly about other rates of convolutional codes that can be decoded with Viterbi
decoders. Earlier, I mentioned punctured codes, which are a common way of
achieving higher code rates, i.e. larger ratios of k to n. Punctured codes are created by
first encoding data using a rate 1/n encoder such as the example encoder described in
this tutorial, and then deleting some of the channel symbols at the output of the
encoder. The process of deleting some of the channel output symbols is called
puncturing. For example, to create a rate 3/4 code from the rate 1/2 code described in
this tutorial, one would simply delete channel symbols in accordance with the
following                               puncturing                               pattern:

1       0            1

1       1            0

where a one indicates that a channel symbol is to be transmitted, and a zero indicates
that a channel symbol is to be deleted. To see how this make the rate be 3/4, think of
each column of the above table as corresponding to a bit input to the encoder, and
each one in the table as corresponding to an output channel symbol. There are three
columns in the table, and four ones. You can even create a rate 2/3 code using a rate
1/2       encoder       with       the      following      puncturing         pattern:

1            1

1            0

which has two columns and three ones.
10
To decode a punctured code, one must substitute null symbols for the deleted symbols
at the input to the Viterbi decoder. Null symbols can be symbols quantized to levels
corresponding to weak ones or weak zeroes, or better, can be special flag symbols that
when processed by the ACS circuits in the decoder, result in no change to the
accumulated error metric from the previous state.

Of course, n does not have to be equal to two. For example, a rate 1/3, K = 3, (7, 7, 5)
code can be encoded using the encoder shown below:

This encoder has three modulo-two adders, so for each input bit, it can produce three
channel symbol outputs. Of course, with suitable puncturing patterns, you can create
higher-rate codes using this encoder as well.

I don't have good data to share with you right now about the traceback depth
requirements for Viterbi decoders for punctured codes. I have been told that instead of
K x 5, depths of K x 7, K x 9, or even more are required to reach the point of
diminishing returns. This would be a good topic around which to design some
experiments using a modified version of the example simulation code I provide.

11
Convolutional code
In telecommunication, a convolutional code is a type of error-correcting code in which

   each m-bit information symbol (each m-bit string) to be encoded is transformed into an n-bit symbol,
where m/n is the code rate (n ≥ m) and

   the transformation is a function of the last k information symbols, where k is the constraint length of the
code.
Where convolutional codes are used
Convolutional codes are used extensively in numerous applications in order to achieve reliable data
transfer, including digital video, radio,mobile communication, and satellite communication. These codes
are often implemented in concatenation with a hard-decision code, particularly Reed Solomon. Prior
to turbo codes, such constructions were the most efficient, coming closest to the Shannon limit.

Convolutional encoding
To convolutionally encode data, start with k memory registers, each holding 1 input bit. Unless otherwise
specified, all memory registers start with a value of 0. The encoder has n modulo-2 adders (a modulo 2
adder can be implemented with a single Boolean XOR gate, where the logic is: 0+0 = 0, 0+1 = 1, 1+0 = 1,
1+1 = 0), and n generator polynomials — one for each adder (see figure below). An input bit m1 is fed
into the leftmost register. Using the generator polynomials and the existing values in the remaining
registers, the encoder outputs n bits. Nowbit shift all register values to the right (m1 moves
to m0, m0 moves to m-1) and wait for the next input bit. If there are no remaining input bits, the encoder
continues output until all registers have returned to the zero state.

The figure below is a rate 1/3 (m/n) encoder with constraint length (k) of 3. Generator polynomials
are G1 = (1,1,1), G2 = (0,1,1), and G3 = (1,0,1). Therefore, output bits are calculated (modulo 2) as
follows:

n1 = m1 + m0 + m-1
n2 = m0 + m-1
n3 = m1 + m-1.

12
Img.1. Rate 1/3 non-recursive, non-systematic convolutional encoder with constraint length 3

Recursive and non-recursive codes
The encoder on the picture above is a non-recursive encoder. Here's an example of a
recursive one:

Img.2. Rate 1/2 recursive, systematic convolutional encoder with constraint length 4

One can see that the input being encoded is included in the output sequence too (look at the
output 2). Such codes are referred to assystematic; otherwise the code is called non-
systematic.

Recursive codes are almost always systematic and, conversely, non-recursive codes are
non-systematic. It isn't a strict requirement, but a common practice.

13
Impulse response, transfer function, and constraint length
A convolutional encoder is called so because it performs a convolution of the input stream
with the encoder's impulse responses:

where     is an input sequence,        is a sequence from output     and      is an impulse
response for output    .

A convolutional encoder is a discrete linear time-invariant system. Every output of an
encoder can be described by its own transfer function, which is closely related to a
generator polynomial. An impulse response is connected with a transfer function
through Z-transform.

Transfer functions for the first (non-recursive) encoder are:





Transfer functions for the second (recursive) encoder are:




Define     by

where, for any rational function                            ,

.

Then      is the maximum of the polynomial degrees of the                    , and
the constraint length is defined as                    . For instance, in the first
example the constraint length is 3, and in the second the constraint length is 4.

Trellis     diagram
A convolutional encoder is a finite state machine. An encoder with n binary
n
cells will have 2 states.

14
Imagine that the encoder (shown on Img.1, above) has '1' in the left memory
cell (m0), and '0' in the right one (m-1). (m1 is not really a memory cell because it
represents a current value). We will designate such a state as "10". According
to an input bit the encoder at the next turn can convert either to the "01" state or
the "11" state. One can see that not all transitions are possible (e.g., a decoder
can't convert from "10" state to "00" or even stay in "10" state).

All possible transitions can be shown as below:

Img.3. A trellis diagram for the encoder on Img.1. A path through the trellis is shown as a red
line. The solid lines indicate transitions where a "0" is input and the dashed lines where a "1"
is input.

An actual encoded sequence can be represented as a path on this graph. One
valid path is shown in red as an example.

This diagram gives us an idea about decoding: if a received sequence doesn't
fit this graph, then it was received with errors, and we must choose the
nearest correct (fitting the graph) sequence. The real decoding algorithms
exploit this idea.

Free     distance and error distribution
The free distance (d) is the minimal Hamming distance between different
encoded sequences. The correcting capability (t) of a convolutional code is the
number of errors that can be corrected by the code. It can be calculated as

Since a convolutional code doesn't use blocks, processing instead a
continuous bitstream, the value of t applies to a quantity of errors located
relatively near to each other. That is, multiple groups of t errors can usually
be fixed when they are relatively far apart.

15
Free distance can be interpreted as the minimal length of an erroneous
"burst" at the output of a convolutional decoder. The fact that errors appear
as "bursts" should be accounted for when designing a concatenated
code with an inner convolutional code. The popular solution for this
problem is to interleave data before convolutional encoding, so that the
outer block (usually Reed-Solomon) code can correct most of the errors.

Decoding        convolutional codes
Several algorithms exist for decoding convolutional codes. For relatively
small     values   of k,    the Viterbi   algorithm is   universally used   as   it
provides maximum likelihood performance and is highly parallelizable.
Viterbi decoders are thus easy to implement in VLSI hardware and in
software on CPUs with SIMD instruction sets.

Longer constraint length codes are more practically decoded with any of
several sequential decoding algorithms, of which the Fano algorithm is the
best known. Unlike Viterbi decoding, sequential decoding is not maximum
likelihood but its complexity increases only slightly with constraint length,
allowing the use of strong, long-constraint-length codes. Such codes were
used in the Pioneer program of the early 1970s to Jupiter and Saturn, but
gave way to shorter, Viterbi-decoded codes, usually concatenated with
large Reed-Solomon error correction codes that steepen the overall bit-
error-rate curve and produce extremely low residual undetected error rates.

Both Viterbi and sequential decoding algorithms return hard-decisions: the
bits that form the most likely codeword. An approximate confidence
measure can be added to each bit by use of the Soft output Viterbi
algorithm. Maximum a posteriori (MAP) soft-decisions for each bit can be
obtained by use of the BCJR algorithm.

Popular      convolutional codes

An especially popular Viterbi-decoded convolutional code, used at least
since the Voyager program has a constraint length k of 7 and a rate rof
1/2.

      Longer constraint lengths produce more powerful codes, but
the complexity of the Viterbi algorithm increases exponentially with

16
constraint lengths, limiting these more powerful codes to deep space
missions where the extra performance is easily worth the increased
decoder complexity.

     Mars Pathfinder, Mars Exploration Rover and the Cassini probe to
Saturn use a k of 15 and a rate of 1/6; this code performs about 2 dB
better than the simpler k=7 code at a cost of 256× in decoding
complexity (compared to Voyager mission codes).
Punctured      convolutional codes
Puncturing is a technique used to make a m/n rate code from a "basic" rate
1/2 code. It is reached by deletion of some bits in the encoder output. Bits
are deleted according to puncturing matrix. The following puncturing
matrices are the most frequently used:

Code                            Free distance (for    NASA   standard   K=7
Puncturing matrix
rate                            convolutional code)

1
1/2
(No                             10
perf.)     1

1 0

2/3                             6
1 1

1 0 1

3/4                             5
1 1 0

1 0 1 0 1

5/6                             4
1 1 0 1 0

7/8        1 0 0 0 1 0 1 3

17
1 1 1 1 0 1 0

For example, if we want to make a code with rate 2/3 using the appropriate
matrix from the above table, we should take a basic encoder output and
transmit every second bit from the first branch and every bit from the
second one. The specific order of transmission is defined by the respective
communication standard.

Punctured    convolutional   codes   are    widely   used     in   the satellite
communications, for example, in INTELSAT systems and Digital Video

Punctured convolutional codes are also called "perforated".

Turbo     codes: replacing convolutional codes
Simple Viterbi-decoded convolutional codes are now giving way to turbo
codes, a new class of iterated short convolutional codes that closely
approach the theoretical limits imposed by Shannon's theorem with much
less decoding complexity than the Viterbi algorithm on the long
convolutional    codes    that   would     be   required    for    the   same
performance. Concatenation with an outer algebraic code (e.g., Reed-
Solomon) addresses the issue of error floors inherent to turbo code
designs.

18
Viterbi algorithm

Curator: Dr. Andrew J. Viterbi, The Viterbi Group LLC, San Diego, CA

Figure 1: Finite-State Machine and Channel

The Viterbi Algorithm produces the maximum likelihood estimates of the successive states of a finite-
state machine (FSM) from the sequence of its outputs which have been corrupted by successively
independent interference terms.

General Problem and Solution

19
Figure 2: 4-State Markov Graph

Fig. 1 illustrates a generic FSM consisting of an                   -stage shift register driven by a           -ary input

sequence          . Consequently, after each shift the FSM dwells in one of                     states. Corresponding to
each state the signal generator produces an output                  , which is generally a real vector. The corrupting
effect, called a channel in communication applications, transforms                  into     , the observables, which by
the nature of their formation constitute a random Markov sequence. Note that the terms of                      constitute a
Markov sequence as well whenever the input sequence                        to the FSM is random, thus inducing a
probability distribution on the state transitions.

Fig. 2 is the state diagram for an FSM with parameters                           (binary) and            . The states then
correspond to their register contents:                    ;            ;             ;              . Clearly only a subset
of transitions is permissible. From the sequence of observables                     , we seek the most likely path of
transitions through the states of the diagram. In Fig. 2, we also designate the FSM outputs on each
branch. Thus the branch from                 to       is labeled with its output           . It is also convenient for the
description of the algorithm to view the multi-step evolution of the path through the graph by means of a
multi-stage replication of the state diagram, which is known as a trellis diagram. Fig. 3 is the trellis
diagram corresponding to the example of Fig. 2. At the top of Fig. 3 are shown the successive branch
observables            ,      ,          ,        .   Henceforth     we    use    the    notation         to   denote   the
observable(s) for the         th successive branch. Similarly                     will denote any state at the           th
successive node level (and we shall dispense with subscripts until necessary).

The goal then is to find the most probable path through the trellis diagram. Provided the successive
terms        of      the     input      sequence              are      independent,          the       state     transition
probabilities                                are mutually independent for all              as are the conditional output
probabilities                                     . For any given path from the origin (                  ) to an arbitrary
node     ,                           , the relative path probability (likelihood function) is given by

For computational purposes it is more convenient to consider its logarithm, which is given by the
sum,

where

20
which is denoted the Branch Metric between any two states at the                    th and      th
node levels. (Note that the unallowable transitions have zero probability and hence their
logarithms will be negative infinity taking them out of competition.) We next define
the State Metric,            , of the state            to be the maximum over all paths
leading from the origin to the      th state at the     th node level. Thus, again inserting
subscripts where necessary,

It then follows that to maximize the above sum over                 terms, it suffices to
maximize the sum over the first                  terms for each state                      at
the             th node and then maximize the sum of this and the          th term over all
states               . Thus,

This recursion is known as the Viterbi Algorithm. It is most easily described
in connection with the trellis diagram. If we label each branch (allowable
transition between states) by its Branch Metric            and each state at each
node level by its State Metric          , the State Metrics at node level           are
obtained from the State Metrics at the level               by adding to each State
Metric at level           the Branch Metrics which connect it to states at the
th level, and for each state at level      preserving only the largest sum which
arrives to it. If additionally at each level we delete all branches other than the
one which produces this maximum, there will remain only one path through the
trellis leading from the origin to each state at the     th level, which is the most
probable path reaching it from the origin. In typical (though not all)
applications, both the initial state (origin) and the final state are fixed to be
and thus the algorithm produces the most probable path through the trellis
both initiating and ending at      .

21
Figure 3: Trellis Diagram for Markov Graph of Fig. 2

Direct Applications
Numerous applications of this algorithm have appeared over the past several
decades. We begin with three applications for which the basic FSM structure
of Fig. 1 is well defined.

Convolutional Codes
The earliest application, for which the algorithm was originally proposed in
1967, was for the maximum likelihood decoding of convolutionally coded
digital sequences transmitted over a noisy channel. Currently the algorithm
forms an integral part of the majority of wireless telecommunication systems,
both   involving satellite        and   terrestrial mobile transmission. For         the
convolutional encoder, the signal generator of Fig. 1 is a linear matrix of
modulo-2 adders, each of which adds together the contents of some subset of
the     shift register stages, thus forming s binary symbols. This addition
operation may be performed after each shift of the register, thus producing a
rate        code, or only once every                   shifts to produce a rate r/s code.
These may be serially transmitted, for example, as binary amplitude
modulation (                 or      ) of a carrier signal. At the receiver, the
demodulator generates an output y, which is either a real number or the result

22
of quantizing the latter to one of a finite set of values. The conditional
densities              of the channel outputs are assumed to be mutually
independent, corresponding to a memoryless channel. The most commonly
treated example is the additive white gaussian noise (AWGN) channel for
which each           is the sum of the encoded symbol                     and a gaussian random
noise variable, with all noise variables mutually independent. This channel
model is closely approximated by satellite and space communication
applications and, with appropriate caution, it can also be applied to terrestrial
communication design.

Thus the 4-state (               ) state diagram of Fig. 2 applies to a convolutional
code of rate           . Since only one input bit changes each time, each state has
only two branches both exiting and entering it, each from and to two other
states. For the exiting branches, one corresponds to a zero entering the
register and the other to a one. It is generally assumed that all input bits are
equally     likely    to   be    a     zero     or   a       one,    so    the    state     transition
probabilities,                                for each branch. Hence the first term of the
branch metric              can be omitted since it is the same for each branch. As
for    the       second         term       of        ,        the     conditional          probability
density                                    , where           is an s-dimensional binary vector
generated by the             modulo-2 adders for each new input bit, which
corresponds to a state transition, while                 is the random vector corresponding
to the s noise-corrupted channel outputs corresponding to                        . For the AWGN,
ln            is proportional to the inner product of the two vectors                      and     .

To generalize to any rational rate                       ,      input bits enter each time and
the register shifts in blocks of           . The Markov graph changes only in having
each state connected to                other states. Note that if                   any state can
be reached from any other state and the Markov graph becomes fully
connected. Another generalization is to map each binary vector                            , not into a
vector of        binary values, +1 or –1, but into a constellation of points in 2 or
more dimensions. An often employed case is quadrature amplitude modulation
(QAM). For example, for                   , sixteen points may be mapped into a two-
dimensional grid, and the value in each dimension modulates the amplitude of
one of the two quadrature components of the sinusoidal carrier. Here                             is the
2-dimensional vector representing one of the sixteen modulating values
and    is     the     corresponding           demodulated           channel      output.     Multiple

23
generalizations of this approach abound in the literature and in real
applications. In most cases this multidimensional approach is used to
conserve bandwidth at the cost of a higher channel signal-to-noise
requirement.

An interesting footnote on this first application is that the Viterbi Algorithm was
proposed not so much to develop an efficient maximum likelihood decoder for
a convolutional code, but primarily to establish bounds on its error correcting
performance.

MLSE Demodulators for Intersymbol Interference and
In the previous application, the convolution operation is employed in order to
introduce redundancy for the purpose of increasing transmission reliability. But
convolution also occurs naturally in physical channels whose bandwidth
constraints linearly distort the transmitted digital signal. Treating the channel
as a linear filter, it is well known that the output signal is the convolution of the
input signal and the filter’s impulse response. A discrete model of the
combination of signal waveform, channel effects and receiver filtering is shown
in Fig. 4. This combination produces, after sampling at the symbol rate, the

discrete convolution                       , where the       are real numbers and
the random variables         are the input bits, generally taken to be binary (+A
or –A). The        terms are called the ―Intersymbol Interference‖ coefficients,
since except for       , all the other terms of the sum represent interference by
preceding symbols on the given symbol. To the output of the discrete filter
must be added noise variables           to account for the channel noise. While
generally these noise variables are not mutually independent, they can be
made so by employing at the receiver a so-called ―whitened matched filter‖
prior to sampling.

A related application with a very similar model is that of multipath-fading
channels. Here the taps represent multiple delays in a multipath channel. Their
relative spacing may be less than one symbol period, in which case each
symbol of the input sequence must be repeated several times. Most important,
because of random variations in propagation, the multipath coefficients          are

24
now random variables, so the conditional densities of                   depend on the
statistics of both the additive noise and the multiplicative random coefficients.

Figure 4: Intersymbol Interference/Multipath Fading Model

Comparing this application with the previous one of convolutional codes, we
note that the principal difference is that modulo-2 addition is replaced by real
addition and there is one rather than s outputs for each branch. Otherwise, the
same Markov graph applies, with two branches emanating from each state
when the input sequence is binary (+A or –A). Generation of the branch
metrics,                       , for multi-path fading is slightly more complex
because besides depending on the noise, the                 variables depend on a linear
combination of the         random variables, with their signs determined by the
register contents involved in the branch transition.

Partial Reponse Maximum Likelihood Decoders for
Recorded Data
A filter model for the magnetic recording medium and reading process is very
similar to the Intersymbol Interference model. The simplest version, known as
the Partial Response channel results, when sampled at the symbol rate, in an
output     depending      on     just    the    difference       between    two      input
symbols,                         , with additive noise samples also being mutually
independent. Note that since inputs            are +1 or –1, the outputs          (prior to
adding noise) are ternary, +2, 0, or –2. This then reduces to the model of
Fig. 4 with just two non-zero taps,                     and              . Often this is
described by the polynomial whose coefficients are the tap values; in this

case                         , which can be modeled by a 2-stage shift register

25
which gives rise to a 4-state Markov graph, as in Fig. 2. But actually, a simpler
model can be used based on the fact that all outputs for which             is odd
depend only on odd indexed inputs, and similarly for even. Thus a 2-state
Markov graph suffices for each of the odd and even subsets. When the
recording density is increased, the simple Partial Response channel model is
replaced by a longer shift register. A generally accepted model has tap

coefficients represented by the polynomial                                       ,
where           . For example, for the case of            , known as ―Extended
Partial Response‖, an eight-state Markov graph applies.

Other Applications: Hidden Markov Models
The Markov nature of the preceding three applications is obvious from the
system model. Many more applications of the algorithm appear in the literature
of numerous fields ranging from signal processing to genetics. In such cases
the term Hidden Markov Model (HMM) is used. Such models are often
empirically derived based on experience in the given discipline. The best
known     and     most    effective   results    were    obtained    for Speech
Recognition and for DNA Sequence Alignment. Since background in each
field is a prerequisite for full understanding, we only comment briefly on the
common requirements for establishing the Markov models by a procedure
known as Baum-Welch estimation. Initially a Markov model is postulated along
with estimated state transition probabilities and the probabilities of observables
conditioned on state transitions. Then given a sequence of observations, the
likelihood of the observation sequence is computed for the postulated model.
At the same time new estimates of the state transition probabilities and the
observable conditional probabilities are developed and from these a new
likelihood function is derived. If this exceeds the previous one, the procedure
is repeated for as many times as the likelihood increases. When it ceases to
increase, the model is accepted and the most likely state sequence is obtained
by using the Viterbi Algorithm.

26
Viterbi decoder

A Viterbi decoder uses the Viterbi algorithm for decoding a bitstream that has been encoded using forward
error correction based on aconvolutional code.

There are other algorithms for decoding a convolutionally encoded stream (for example, the Fano algorithm).
The Viterbi algorithm is the most resource-consuming, but it does the maximum likelihood decoding. It is most
often used for decoding convolutional codes with constraint lengths k<=10, but values up to k=15 are used in
practice.

Viterbi decoding was developed by Andrew J. Viterbi and published in the paper "Error Bounds for
Convolutional Codes and an Asymptotically Optimum Decoding Algorithm", IEEE Transactions on Information
Theory, Volume IT-13, pages 260-269, in April, 1967.

There are both hardware (in modems) and software implementations of a Viterbi decoder.

Hardware implementation

A common way to implement a hardware viterbi decoder

A hardware Viterbi decoder for basic (not perforated) code usually
consists of the following major blocks:

   Branch metric unit (BMU)
   Path metric unit (PMU)
   Traceback unit (TBU)

27
Branch      metric unit (BMU)

A sample implementation of a branch metric unit

A branch metric unit's function is to calculate branch metrics, which are
normed distances between every possible symbol in the code alphabet,
There are hard decision and soft decision Viterbi decoders. A hard
decision Viterbi decoder receives a simple bitstream on its input, and
a Hamming distance is used as a metric. A soft decision Viterbi decoder
received       symbol.     For     instance,    in  a 3-bit     encoding,
this reliability information is encoded as follows:
value meaning

000 strongest 0

001 relatively strong 0

010 relatively weak 0

011 weakest 0

100 weakest 1

101 relatively weak 1

110 relatively strong 1

28
111 strongest 1

Of course, it is not the only way to encode reliability data.
The squared Euclidean distance is used as a metric for soft decision
decoders.
Path    metric unit (PMU)

A sample implementation of a path metric unit for a specific K=4 decoder

A path metric unit summarizes branch metrics to get metrics for 2K −
1
paths, one of which can eventually be chosen as optimal. Every clock it
makes 2K − 1 decisions, throwing off wittingly nonoptimal paths. The results
of these decisions are written to the memory of a traceback unit.
The core elements of a PMU are ACS (Add-Compare-Select) units. The
way in which they are connected between themselves is defined by a
specific code's trellis diagram.
Since branch metrics are always       , there must be an additional circuit
preventing metric counters from overflow (it isn't shown on the image).
An alternate method that eliminates the need to monitor the path metric
growth is to allow the path metrics to "roll over", to use this method it is
necessary to make sure the path metric accumulators contain enough
bits to prevent the "best" and "worst" values from coming within 2(n-1) of
each other. The compare circuit is essentially unchanged.

29
A sample implementation of an ACS unit

It is possible to monitor the noise level on the incoming bit stream by
monitoring the rate of growth of the "best" path metric. A simpler way to
do this is to monitor a single location or "state" and watch it pass
"upward" through say four discrete levels within the range of the
accumulator. As it passes upward through each of these thresholds, a
counter is incremented that reflects the "noise" present on the incoming
signal.
Traceback        unit (TBU)

A sample implementation of a traceback unit

30
Back-trace unit restores an (almost) maximum-likelihood path from the
decisions made by PMU. Since it does it in inverse direction, a viterbi
decoder comprises a FILO (first-in-last-out) buffer to reconstruct a
correct order.
Note that the implementation shown on the image requires double
frequency. There are some tricks that eliminate this requirement.
Implementation           issues
Quantization      for soft decision decoding
In order to fully exploit benefits of soft decision decoding, one needs to
quantize the input signal properly. The optimal quantization zone width is
defined by the following formula:

where N0 is a noise power spectral density, and k is a number of bits for
soft decision.
Euclidean       metric computation
The squared norm ( 2) distance between the received and the actual
symbols in the code alphabet may be further simplified into a linear
sum/difference form, which makes it less computationally intensive.
Consider a 1/2 convolutional coder, which generates 2 bits
(00, 01, 10 or 11) for every input bit (1 or 0). These Return-to-
Zero signals are translated into a Non-Return-to-Zero form shown
alongside.
code alphabet vector mapping

00           1, 1

01           1, -1

10           -1, 1

11           -1, -1

31
Each received symbol may be represented in vector form as vr = {r0, r1},
where r0 and r1 are soft decision values, whose magnitudes signify
the joint reliability of the received vector, vr.
Every symbol in the code alphabet may, likewise, be represented by the
vector vi = {±1, ±1}.
The actual computation of the Euclidean distance metric is:

Each square term is a normed distance, depicting the energy of the
symbol. For ex., the energy of the symbol vi = {±1, ±1} may be computed
as

Thus, the energy term of all symbols in the code alphabet is constant (at
(normalized) value 2).
The Add-Compare-Select (ACS) operation compares the metric distance
between the received symbol ||vr|| and any 2 symbols in the code
alphabet whose paths merge at a node in the corresponding
trellis, ||vi(0)|| and ||vi(1)||. This is equivalent to comparing

and

But, from above we know that the energy of vi is constant (equal to
(normalized) value of 2), and the energy of vr is the same in both cases.
This reduces the comparison to a minima function between the 2
(middle) dot product terms,

since a min operation on negative numbers may be interpreted as an
equivalent max operation on positive quantities.
Each dot product term may be expanded as

32
where, the signs of each term depend on symbols, vi(0) and vi(1), being
compared. Thus, the squared Euclidean metric distance calculation to
compute the branch metric may be performed with a simple add/subtract
operation.
Traceback

The general approach to traceback is to accumulate path metrics for up
to five times the constraint length (5 * (K - 1)), find the node with the
largest accumulated cost, and begin traceback from this node.
However, computing the node which has accumulated the largest cost
(either the largest or smallest integral path metric) involves finding
themaxima or minima of several (usually 2K-1) numbers, which may be
time consuming when implemented on embedded hardware systems.
Most communication systems employ Viterbi decoding involving data
packets of fixed sizes, with a fixed bit/byte pattern either at the beginning
or/and at the end of the data packet. By using the known bit/byte pattern
as reference, the start node may be set to a fixed value, thereby
obtaining a perfect Maximum Likelihood Path during traceback.
Limitations

A physical implementation of a viterbi decoder will not yield
an exact maximum-likelihood stream due to quantization of the input
signal, branch and path metrics, and finite traceback length. Practical
implementations do approach within 1dB of the ideal.
Perforated    codes
A hardware viterbi decoder            of perforated   codes is    commonly
implemented in such a way:

   A deperforator, which transforms the input stream into the stream
which looks like an original (imperforated) stream with ERASE marks
at the places where bits were erased.
   A basic viterbi decoder understanding these ERASE marks (that is,
not using them for branch metric calculation).

33
Software   implementation
See Viterbi algorithm.
One of the most time-consuming operations is an ACS butterfly, which is
usually implemented using an assembly language and appropriate
instruction set extensions (such as SSE2) to speed up the decoding
time.
Applications

The Viterbi decoding algorithm is widely used in the following areas:

   Decoding trellis-coded modulation (TCM), the technique used in
telephone-line modems to squeeze high spectral efficiency out of
3 kHz-bandwidth analog telephone lines. The TCM is also used in
the PSK31 digital mode for amateur radio and sometimes in the radio
relayand satellite communications.
   Automatic speech recognition
   Decoding convolutional codes in satellite communications.
   Computer storage devices such as hard disk drives.

Viterbi decoding has the advantage that it has a fixed decoding time. It is well suited
to hardware decoder implementation. But its computational requirements grow
exponentially as a function of the constraint length, so it is usually limited in practice
to constraint lengths of K = 9 or less. Stanford Telecom produces a K =
9 Viterbidecoder that operates at rates up to 96 kbps, and a K = 7 Viterbi decoder
that operates at up to 45 Mbps. Advanced Wireless Technologies offers a K =
9 Viterbidecoder that operates at rates up to 2 Mbps. NTT has announced
a Viterbi decoder that operates at 60 Mbps, but I don't know its commercial
availability. Moore's Law applies to Viterbi decoders as well as to microprocessors, so
consider the rates mentioned above as a snapshot of the state-of-the-art taken in
early 1999.

34

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 111 posted: 10/15/2011 language: English pages: 34
Description: viterbi algorithm,viterbi decoding
Priya A