Low Power SST Viterbi Decoder by malj


									                  Low Power, High-Rate Viterbi Decoder Employing the SST
                    (Scarce State Transition) Scheme and Radix-4 Trellis
                                     Sang-Cheon Kim, Je-Hyuk Ryu, and Jun-Dong Cho
                      Department of Electrical and Computer Engineering, Sungkyunkwan University
                        300, Cheoncheon-Dong, Jangan-Ku, Sowon-City, Kyungki, 440-746, Korea
                      E-mail : sckim52@nature.skku.ac.kr, Tel : 0331-290-7200, Fax : 0331-290-5819

                                                               reducing the supply voltage to under 3V. For further
                        Abstract                               reduction in power consumption, the scarce state
   This paper presents a new Viterbi decoder with low          transition (SST) scheme has been studied for radix-2
power consumption employing scarce state transition            units[6] and expected that it is applied to higher speed
(SST) scheme and with high throughput rates exploiting         ACS circuits[1]. But the scheme has not yet been applied
radix-4 trellis. The SST makes it possible to omit the         to higher speed ACS circuits such as radix-4 units in the
maximum likelihood decision (MLD) circuit[1, 6]. The           literature.
architecture of the add-compare-select (ACS) array is             In recent years there has been interest in
based on a restructuring of the conventional radix-2 trellis   implementations of the Viterbi algorithm at rates on the
into a radix-4 trellis. Radix-4 units, consisting of four 4-   order of 100 Mb/s. Driving applications include
way ACS units, process two stages of the constituent           convolutional decoders for error correction, trellis code
radix-2 trellis per iteration[2]. With the SYNOPSYSTM          demodulation for communication channels, and digital
power estimation tool, DesignPowerTM, our experimental         sequence detection for magnetic storage devices. An
result shows on the average 33% reduction in power and         important problem found in such applications is the
a radix-4 iteration rate of 50MHz, equivalent to a decode      decoding of a binary shift register (BSR) trellis. The
rate of 100Mb/s under typical operating conditions (V DD       classical high throughput implementation for such
= 5.0V, TA = 27C). The proposed SST-based Viterbi             decoders is the radix-2 fully parallel approach, where add-
decoder using 4-way ACS can be applied to the                  compare-select (ACS) units are assigned to each state and
multimedia mobile communications for targeting low             organized in pairs to iterate one stage of a two-state trellis.
power consumption.                                             The decode rate of this approach is fundamentally limited
                                                               by the recursive ACS iteration. In this paper we show that
                                                               a new SST-based Viterbi decoder using 4-way ACS
1. Introduction                                                achieves power reduction against the recent radix-4
   The convolutional encoding and Viterbi decoding             Viterbi decoder[2].
scheme[3] is used by many satellite communication or              The following sections in this paper are organized as
broadcast systems because the scheme shows powerful            follows. We first briefly review the Viterbi algorithm, and
forward error correction (FEC) performance and the great       then radix-4 ACS update in Section 2 and 3 respectively.
progress in VLSI process technologies makes it possible        Section 4 discusses the architecture of radix-4 Viterbi
to realize one chip high speed encoder/decoders[4, 5]. The     decoder. Section 5 is devoted to the proposed SST Viterbi
conventional Viterbi decoders for digital cellular or          decoder using 4-way ACS. A concluding summary of this
personal communication systems employ the digital signal       work can be found in Section 6.
processing (DSP) scheme. Unfortunately, such devices
have inherently high power consumption during decoding
because of their very large logic circuits and high clock
                                                               2. Viterbi Algorithm
speeds. On the other hand, developments in recent VLSI            We can view the Viterbi algorithm as a dynamic
process technology have focused on not only speed and          programming algorithm for finding the shortest path
the number of gates, but also power consumption by             through a trellis, and the algorithm can be broken down
into the following three steps.                                                          way ACS unit[2, 7]. As well as calculating the updated
      1. Weigh the trellis; that is, calculate the branch                                state metric, the ACS unit outputs a decision ds,n, which
           metrics.                                                                      identifies the entering path of the minimum metric.
      2. Recursively compute the shortest paths to time n,                                  In order that the input sequence can be decoded, the
           in terms of the shortest paths to time n-1. In this                           survivor path (shortest path) or signal through the trellis
           step, decisions are used to recursively update the                            must be traced and decoded. The two classical algorithms
           survivor path of the signal. This is known as                                 for survivor path storage and decoding are the register-
           add-compare-select (ACS) recursion.                                           exchange method and the trace-back method. Both
      3. Recursively find the shortest path leading to                                   algorithms require a recursive update which
           each trellis state using the decisions from Step 2.                           fundamentally limits the throughput. Register-exchange is
           The shortest path is called the survivor path for                             suitable for low-complexity trellises and is high in
           that state and the process is referred to as                                  throughput. Trace-back is preferable for higher
           survivor path decode. Finally, if all survivor                                complexity trellises due to reduced area and power
           paths are traced back in time, they merge into a                              dissipation, In the Viterbi decoder, the register-exchange
           unique path, which is the most likely signal path                             method is used to finish the survivor path storage and
           that we are trying to find.                                                   decoding.
   Associated with each trellis state S at time n is a state
metric s,n, which is the accumulated metric along the
shortest path leading to that state. The state metrics at time
                                                                                         3. Radix-4 ACS Update
n can be recursively calculated in terms of the state                                       In Fig. 1(a), for each state of the trellis, there are two
metrics of the previous iteration as follows:                                            input branches. This is called a radix-two trellis. We will
                 j,n = mini { i,n-1 + ij,n-1 }           (1)                          use this trellis as an example to demonstrate the
                                                                                         transformation of a radix-two trellis into a radix-four
where i is a predecessor state of j and ij,n-1 is the branch                            trellis. The four-state radix-two trellis of Fig. 1(a) has
metric on the transition from state i to state j. The                                    been redrawn in Fig. 2(a) for two iterations. Using M-step
qualitative interpretation of this expression is as follows.                                      n-2      n-1       n        n-2               n

By definition, the shortest path into state j must pass                                            0        0        0         0                0

through a predecessor state. If the shortest path into j                                           1        1        1         1                1

passes through i, then the state metric for the path must be
                                                                                                   2        2        2         2                2
given by the state metric for i plus the branch metric for
                                                                                                   3        3        3         3                3
the state transition from i to j. The final state metric for j is
                                                                                                           (a)                         (b)
given by the minimum of all possible paths. The recursive                                      Fig. 2 (a) 4-state radix-2 trellis over two iterations
update described by (1) is the ACS operation and is                                                        (b) Equivalent radix-4 trellis
implemented as shown in Fig. 1 for the four-state trellis
                                                    00,n-1                              ACS recursion theory, we can see, whatever the state
                                                                                 d 0,n
                n-1         n
       0,n-1
                                  0,n    0,n-1      +
                                                                                         metric values at time n-1, there are four predecessor states

                                                                                  0,n   of state 0 at time n: state 0, state 1, state 2 and state 3. The

       2,n-1         20,n-1                                                            transition branch from state 0 (at time n-2) to state 0 (at
                                          2,n-1      +
                                                    20,n-1                              time n) consists of two branch segments, state 0 at n-2 to
                     (a)                                   (b)
                                                                                         state 0 at n-1, and the other from state 0 at n-1 to state 0 at
  Fig. 1 (a) Predecessor states of state 00 (b) State metric                             n-2. The branch metrics from state 0 at n-2 to state 0 at n
    update for state 00, implemented using 2-way ACS                                     is the summation of the branch metrics of the two branch
                                                                                         segments, that is
example. The update unit is referred to as a two-way ACS                                                    00,n = 00,n-2 + 00,n-1
unit, because there are two input branches for each state,                               00,n is the branch metric for the newly combined
In general, a state with m-input branches requires an m-                                 transition branch from state 0 at n-2 state 0 at n. Similarly,
                                                                                              Soft         Branch
the branch metric from state 1 at time n-2 to a state 0 at                                  decision        metric
                                                                                                                                 ACS                          Smallest
                                                                                                                                 array                         metric
                                                                                             inputs       calculator
time n can be calculated by
                    10,n = 12,n-2 + 20,n-1
In general, the branch metric from state i at time n-2 to                                                                         Path                        Output     Decoded
                                                                                                                                 update                       select       output
state j at time n is the summation of the two branch
metrics                                                                                                Fig. 4 Block diagram for architecture of
                    ij,n = i1,n-2 + 1j,n-1                                                                 radix-4 Viterbi decoder
where 1 is the intermediate state to which the combined
branch passes at time n-1. Or, if we use the same                                      The specifications for the Viterbi decoder are
expression as follows[7]:                                                                  Eight-state, R=1/2 or (2, 1, 3) convolutional
                          2n = n+1  n                                                     decoder/encoder
So if we calculate the branch metrics n and n+1 for two                                  Generator polynomials g(1)= (1011), g(2)= (1111)
successively received symbols, we can find the new                                         Eight-level soft decision inputs, uniform metrics
branch metric 2n by addition. The two iterations of the                                   Survivor path length of 20 (radix-two trellis
trellis can be combined into one iteration. After an initial                                  iterations)
time delay, two symbols will be decoded at each clock                                     The VLSI implementation of the radix-four ACS unit is
cycle instead of one. The combination of trellis in Fig.                               shown in Fig. 5. In order to achieve the potential two-fold
2(a) is shown in Fig. 2(b). We call the new trellis a radix-                           increase in throughput offered by the radix-four
four trellis, because for each state at time n, there are four                         architecture, the radix-four ACS delay must be almost
input branches. After a two-step combination, a radix-two                              equal to that of the radix-two ACS. Instead of using two
trellis has been changed to a radix-four trellis.                                      stages of comparators to find the smallest metric among
   The state metric labeling for radix-four trellis of Fig.                            four inputs, we use the structure shown in Fig. 5. The four
2(b) is shown in Fig. 3(a). Since each output state has four                           input comparator is evaluated by generating the six
predecessor states, state metrics are updated using a four-                            possible pair-wise comparisons, and the results are
way ACS unit as shown in Fig. 3(b). The whole radix-four                               combined in two-levels of logic to form the minimum
trellis is updated four four-way ACS units, one for each                               metric selection. All comparators in Fig. 5 are constructed
state of the trellis.                                                                  using the modulo arithmetic approach proposed in [8].
                                                 00,n-2                       d 0,n                                                                                                    0
              n-2         n                                                                                                                                                         0

     0,n-2                    0,n    0,n-2      +                                                                                                                                    1
               0          0                                                                                                                                                     +
                                                 10,n-2                                                                1:4                                                         1
                                                                                             cirtem weN                 xuM                                                             2
                                                                                                                                                                                +       
     1,n-2                    1,n    1,n-2       +                                                                                                                               2

               1          1

                                                 30,n-2                                                                                                                                3
                                                                                0,n                                                                                            +

     2,n-2                    2,n    2,n-2      +                                                      reffuB
                                                                                                                        /2                                rotarapmoC
               2          2
                                                 30,n-2
                                                                                                                                    cigol lanoitanibmoC

     3,n-2                    3,n    3,n-2      +
               3          3                                                                                                                               rotarapmoC

                    (a)                                    (b)                                                                                            rotarapmoC

   Fig. 3 (a) 4-state radix-4 trellis with state and branch                                                                                               rotarapmoC

                labels    (b) 4-way ACS unit                                                                                                              rotarapmoC

                                                                                                   Fig. 5 Block diagram of radix-4 ACS unit

4. Architecture of Radix-4 Viterbi Decoder
   A simple functional block diagram of Viterbi decoder
                                                                                       5. Our Architecture: SST Viterbi Decoder
architecture is shown in Fig. 4. The architecture of the
decoder can be basically partitioned into the follo w ing
                                                                                          Using 4-way ACS
five primary functional units: the branch metric calculator,                             A block diagram of our SST Viterbi decoder with R=1/2
the ACS array, the path-update unit, the smallest-metric                               and K=4 is shown in Fig. 6. In the SST Viterbi decoder,
unit, and the output-select unit.                                                      the most significant bits of the soft-decision data are sent
to a simple decoder (pre-decoder). The decoded data are                                                                                                  equivalent to a decode rate of 100Mb/s under typical
then encoded by a re-encoder using the same polynomial                                                                                                   operating conditions (VDD = 5.0V, TA = 27C). We used
expression as used by the transmitter. The re-encoded data                                                                                               the Design Compiler of SYNOPSYSTM and measured
and the delayed received data are added in a modulo-2                                                                                                    power     consumption     using     DesignPowerTM    of
operation. The summed data, which consist mainly of                                                                                                      SYNOPSYS . The constraints used for our experiment
errors that occurred during transmission, are fed into the                                                                                               are set as follows: Target library is LSI10K, Operating
branch metric calculator and decoded in ACS and path                                                                                                     condition is WCCOM. The proposed SST-based Viterbi
memory circuits. Finally, the error corrected data are                                                                                                   decoder using 4-way ACS can be applied to the
obtained by modulo-2 addition between the output of path                                                                                                 multimedia mobile communications for targeting low
memory circuits and the pre-decoder signal.                                                                                                              power consumption.
   In the SST Viterbi decoder, the input signals to the                                                                                                                 Table 1 Experimental Result
branch metric calculator and the path memory circuits are                                                                                                        Area(gates)                   Power(W)
all „0‟ unless channel errors occur. That is, the maximum                                                                                                Radix-4      SST        %     Radix-4      SST      %
likelihood state of the decoding signal is mainly                                                                                                         18445      19775      -7% 2589.36 1734.84 33%
                                                                                                                                                         Radix-4: Viterbi decoder using radix-4 trellis
distributed around all „0‟ state, therefore, the switching
rate of the path memory circuits is drastically reduced                                                                                                  SST : SST-based Viterbi decoder using radix-4 trellis

because the data passing through the circuits are almost
always zero except the received data affected by channel                                                                                                 References
errors. Therefore, the power consumption of the path
memory circuits is reduced compared with that of the                                                                                                     [1] S. Kubota, S. Kato, and T. Ishitani, “Novel Vitebi
conventional Viterbi decoder.                                                                                                                                Decoder VLSI Implementation and Its Performance”,
                                                                                    SST Viterbi decoder using 4-way ACS                                      IEEE Trans. Commun., vol. 41, no. 8, pp. 1170-1178,

                                                                                                                                                             Aug. 1993.
                      pre-decoder          +
                                                                                                                                                         [2] P. J. Black and T. H. Meng, “A 140-Mb/s, 32-State,
                          +     1/                        1/                                             delay-2
                                                                                                                                                             Radix-4 Viterbi Decoder”, IEEE J. Solid-State Circuits,
                MSB / /
                 3/                        3/
                                                                                                                                                             vol. 27, no. 12, pp. 1877-1885, Dec. 1992.
                                                     + 3/                                        ACS
                              delay-1      3
                                                               +/                                                                                        [3] A. J. Viterbi, “Convolutional Codes and Their
                                                                     Branch metric calculator

                                                                                                 ACS                                 1/

                                                                                                                                                             Performance in Communication Systems”, IEEE Trans.
                                                                                                                                                             Commun., vol. COM-19, pp. 751-772, Oct. 1971.
                                                                                                 · · ·

                                                                                                                                 / +      /
                                                                                                                   · · ·

                                                                                                                                              decoding   [4] H. A. Bustamante, I. Kang, C. Nguyen, and R. E. Peile,

                                                                                                                                     1/                      “Stanford Telecom VLSI of A Convolutional Decoder”,
                 3/                        3/                   3
 soft-decided    3/           delay-1      3         + 3/
                                                               + /
                                                                                                                           ···                               in Proc., IEEE MILCOM, vol. 1, pp. 171-178, Oct.
     data                                  /
                MSB / /                         1
                                                /                                               Radix-4 Viterbi decoder

                                1/                        1/
                                                                                                                                                         [5] R. Kerr, H. Dehesh, A. B. David, and D. Werner, “A
                          +                                                                              delay-2

                                                                                                                                                             25 MHz Viterbi FEC Codec”, in Proc, IEEE CICC, pp.
                      pre-decoder          +
                                                                                                                                                             16.6.1-16.6.5, May 1990.
                Fig. 6 SST Viterbi decoder using radix-4 trellis                                                                                         [6] S. Kubota, K. Ohtani, and S. Kato, “High-speed and
                                                                                                                                                             High-coding-gain Viterbi Decoder with Low Power
                                                                                                                                                             Consumption Employing SST (scarce state transition)
6. Conclusion                                                                                                                                                Scheme”, Electron. Lett., vol. 22, Apr. 1986.
    In this paper, we proposed a new Viterbi decoder with                                                                                                [7] G. Fettweis and H. Meyr, “High-Speed Parallel Viterbi
low power consumption employing scarce state transition                                                                                                      Decoding Algorithm and VLSI-Architecture”, IEEE
(SST) scheme and with high throughput rates exploiting                                                                                                       Commun. Mag., vol. 29, no. 5, pp. 46-55, May 1991.
radix-4 trellis. As shown in Table 1, our experimental                                                                                                   [8] A. P. Hekstra, “An Alternative to Metric Rescaling in
result shows 33% reduction in power at a cost of 7%                                                                                                          Viterbi Decoders”, IEEE Trans. Commun., vol. 37, no.
increase in area and a radix-4 iteration rate of 50MHz,                                                                                                      11, pp. 1220-1222, Nov. 1989.

To top