VIEWS: 49 PAGES: 4 POSTED ON: 3/7/2010 Public Domain
Low Power, High-Rate Viterbi Decoder Employing the SST (Scarce State Transition) Scheme and Radix-4 Trellis Sang-Cheon Kim, Je-Hyuk Ryu, and Jun-Dong Cho Department of Electrical and Computer Engineering, Sungkyunkwan University 300, Cheoncheon-Dong, Jangan-Ku, Sowon-City, Kyungki, 440-746, Korea E-mail : sckim52@nature.skku.ac.kr, Tel : 0331-290-7200, Fax : 0331-290-5819 reducing the supply voltage to under 3V. For further Abstract reduction in power consumption, the scarce state This paper presents a new Viterbi decoder with low transition (SST) scheme has been studied for radix-2 power consumption employing scarce state transition units[6] and expected that it is applied to higher speed (SST) scheme and with high throughput rates exploiting ACS circuits[1]. But the scheme has not yet been applied radix-4 trellis. The SST makes it possible to omit the to higher speed ACS circuits such as radix-4 units in the maximum likelihood decision (MLD) circuit[1, 6]. The literature. architecture of the add-compare-select (ACS) array is In recent years there has been interest in based on a restructuring of the conventional radix-2 trellis implementations of the Viterbi algorithm at rates on the into a radix-4 trellis. Radix-4 units, consisting of four 4- order of 100 Mb/s. Driving applications include way ACS units, process two stages of the constituent convolutional decoders for error correction, trellis code radix-2 trellis per iteration[2]. With the SYNOPSYSTM demodulation for communication channels, and digital power estimation tool, DesignPowerTM, our experimental sequence detection for magnetic storage devices. An result shows on the average 33% reduction in power and important problem found in such applications is the a radix-4 iteration rate of 50MHz, equivalent to a decode decoding of a binary shift register (BSR) trellis. The rate of 100Mb/s under typical operating conditions (V DD classical high throughput implementation for such = 5.0V, TA = 27C). The proposed SST-based Viterbi decoders is the radix-2 fully parallel approach, where add- decoder using 4-way ACS can be applied to the compare-select (ACS) units are assigned to each state and multimedia mobile communications for targeting low organized in pairs to iterate one stage of a two-state trellis. power consumption. The decode rate of this approach is fundamentally limited by the recursive ACS iteration. In this paper we show that a new SST-based Viterbi decoder using 4-way ACS 1. Introduction achieves power reduction against the recent radix-4 The convolutional encoding and Viterbi decoding Viterbi decoder[2]. scheme[3] is used by many satellite communication or The following sections in this paper are organized as broadcast systems because the scheme shows powerful follows. We first briefly review the Viterbi algorithm, and forward error correction (FEC) performance and the great then radix-4 ACS update in Section 2 and 3 respectively. progress in VLSI process technologies makes it possible Section 4 discusses the architecture of radix-4 Viterbi to realize one chip high speed encoder/decoders[4, 5]. The decoder. Section 5 is devoted to the proposed SST Viterbi conventional Viterbi decoders for digital cellular or decoder using 4-way ACS. A concluding summary of this personal communication systems employ the digital signal work can be found in Section 6. processing (DSP) scheme. Unfortunately, such devices have inherently high power consumption during decoding because of their very large logic circuits and high clock 2. Viterbi Algorithm speeds. On the other hand, developments in recent VLSI We can view the Viterbi algorithm as a dynamic process technology have focused on not only speed and programming algorithm for finding the shortest path the number of gates, but also power consumption by through a trellis, and the algorithm can be broken down into the following three steps. way ACS unit[2, 7]. As well as calculating the updated 1. Weigh the trellis; that is, calculate the branch state metric, the ACS unit outputs a decision ds,n, which metrics. identifies the entering path of the minimum metric. 2. Recursively compute the shortest paths to time n, In order that the input sequence can be decoded, the in terms of the shortest paths to time n-1. In this survivor path (shortest path) or signal through the trellis step, decisions are used to recursively update the must be traced and decoded. The two classical algorithms survivor path of the signal. This is known as for survivor path storage and decoding are the register- add-compare-select (ACS) recursion. exchange method and the trace-back method. Both 3. Recursively find the shortest path leading to algorithms require a recursive update which each trellis state using the decisions from Step 2. fundamentally limits the throughput. Register-exchange is The shortest path is called the survivor path for suitable for low-complexity trellises and is high in that state and the process is referred to as throughput. Trace-back is preferable for higher survivor path decode. Finally, if all survivor complexity trellises due to reduced area and power paths are traced back in time, they merge into a dissipation, In the Viterbi decoder, the register-exchange unique path, which is the most likely signal path method is used to finish the survivor path storage and that we are trying to find. decoding. Associated with each trellis state S at time n is a state metric s,n, which is the accumulated metric along the shortest path leading to that state. The state metrics at time 3. Radix-4 ACS Update n can be recursively calculated in terms of the state In Fig. 1(a), for each state of the trellis, there are two metrics of the previous iteration as follows: input branches. This is called a radix-two trellis. We will j,n = mini { i,n-1 + ij,n-1 } (1) use this trellis as an example to demonstrate the transformation of a radix-two trellis into a radix-four where i is a predecessor state of j and ij,n-1 is the branch trellis. The four-state radix-two trellis of Fig. 1(a) has metric on the transition from state i to state j. The been redrawn in Fig. 2(a) for two iterations. Using M-step qualitative interpretation of this expression is as follows. n-2 n-1 n n-2 n By definition, the shortest path into state j must pass 0 0 0 0 0 through a predecessor state. If the shortest path into j 1 1 1 1 1 passes through i, then the state metric for the path must be 2 2 2 2 2 given by the state metric for i plus the branch metric for 3 3 3 3 3 the state transition from i to j. The final state metric for j is (a) (b) given by the minimum of all possible paths. The recursive Fig. 2 (a) 4-state radix-2 trellis over two iterations update described by (1) is the ACS operation and is (b) Equivalent radix-4 trellis implemented as shown in Fig. 1 for the four-state trellis 00,n-1 ACS recursion theory, we can see, whatever the state d 0,n n-1 n 0,n-1 00,n-1 0,n 0,n-1 + metric values at time n-1, there are four predecessor states 00 Select Compare 0,n of state 0 at time n: state 0, state 1, state 2 and state 3. The Add 2,n-1 20,n-1 transition branch from state 0 (at time n-2) to state 0 (at 2,n-1 + 10 20,n-1 time n) consists of two branch segments, state 0 at n-2 to (a) (b) state 0 at n-1, and the other from state 0 at n-1 to state 0 at Fig. 1 (a) Predecessor states of state 00 (b) State metric n-2. The branch metrics from state 0 at n-2 to state 0 at n update for state 00, implemented using 2-way ACS is the summation of the branch metrics of the two branch segments, that is example. The update unit is referred to as a two-way ACS 00,n = 00,n-2 + 00,n-1 unit, because there are two input branches for each state, 00,n is the branch metric for the newly combined In general, a state with m-input branches requires an m- transition branch from state 0 at n-2 state 0 at n. Similarly, Soft Branch the branch metric from state 1 at time n-2 to a state 0 at decision metric ACS Smallest array metric inputs calculator time n can be calculated by 10,n = 12,n-2 + 20,n-1 In general, the branch metric from state i at time n-2 to Path Output Decoded update select output state j at time n is the summation of the two branch metrics Fig. 4 Block diagram for architecture of ij,n = i1,n-2 + 1j,n-1 radix-4 Viterbi decoder where 1 is the intermediate state to which the combined branch passes at time n-1. Or, if we use the same The specifications for the Viterbi decoder are expression as follows[7]: Eight-state, R=1/2 or (2, 1, 3) convolutional 2n = n+1 n decoder/encoder So if we calculate the branch metrics n and n+1 for two Generator polynomials g(1)= (1011), g(2)= (1111) successively received symbols, we can find the new Eight-level soft decision inputs, uniform metrics branch metric 2n by addition. The two iterations of the Survivor path length of 20 (radix-two trellis trellis can be combined into one iteration. After an initial iterations) time delay, two symbols will be decoded at each clock The VLSI implementation of the radix-four ACS unit is cycle instead of one. The combination of trellis in Fig. shown in Fig. 5. In order to achieve the potential two-fold 2(a) is shown in Fig. 2(b). We call the new trellis a radix- increase in throughput offered by the radix-four four trellis, because for each state at time n, there are four architecture, the radix-four ACS delay must be almost input branches. After a two-step combination, a radix-two equal to that of the radix-two ACS. Instead of using two trellis has been changed to a radix-four trellis. stages of comparators to find the smallest metric among The state metric labeling for radix-four trellis of Fig. four inputs, we use the structure shown in Fig. 5. The four 2(b) is shown in Fig. 3(a). Since each output state has four input comparator is evaluated by generating the six predecessor states, state metrics are updated using a four- possible pair-wise comparisons, and the results are way ACS unit as shown in Fig. 3(b). The whole radix-four combined in two-levels of logic to form the minimum trellis is updated four four-way ACS units, one for each metric selection. All comparators in Fig. 5 are constructed state of the trellis. using the modulo arithmetic approach proposed in [8]. 00,n-2 d 0,n 0 + n-2 n 0 0,n-2 0,n 0,n-2 + 1 0 0 + 10,n-2 1:4 1 cirtem weN xuM 2 + 1,n-2 1,n 1,n-2 + 2 Select 1 1 Compare 30,n-2 3 0,n + 3 2,n-2 2,n 2,n-2 + reffuB /2 rotarapmoC 2 2 30,n-2 rotarapmoC cigol lanoitanibmoC 3,n-2 3,n 3,n-2 + 3 3 rotarapmoC lortnoC (a) (b) rotarapmoC Fig. 3 (a) 4-state radix-4 trellis with state and branch rotarapmoC labels (b) 4-way ACS unit rotarapmoC Fig. 5 Block diagram of radix-4 ACS unit 4. Architecture of Radix-4 Viterbi Decoder A simple functional block diagram of Viterbi decoder 5. Our Architecture: SST Viterbi Decoder architecture is shown in Fig. 4. The architecture of the decoder can be basically partitioned into the follo w ing Using 4-way ACS five primary functional units: the branch metric calculator, A block diagram of our SST Viterbi decoder with R=1/2 the ACS array, the path-update unit, the smallest-metric and K=4 is shown in Fig. 6. In the SST Viterbi decoder, unit, and the output-select unit. the most significant bits of the soft-decision data are sent to a simple decoder (pre-decoder). The decoded data are equivalent to a decode rate of 100Mb/s under typical then encoded by a re-encoder using the same polynomial operating conditions (VDD = 5.0V, TA = 27C). We used expression as used by the transmitter. The re-encoded data the Design Compiler of SYNOPSYSTM and measured and the delayed received data are added in a modulo-2 power consumption using DesignPowerTM of TM operation. The summed data, which consist mainly of SYNOPSYS . The constraints used for our experiment errors that occurred during transmission, are fed into the are set as follows: Target library is LSI10K, Operating branch metric calculator and decoded in ACS and path condition is WCCOM. The proposed SST-based Viterbi memory circuits. Finally, the error corrected data are decoder using 4-way ACS can be applied to the obtained by modulo-2 addition between the output of path multimedia mobile communications for targeting low memory circuits and the pre-decoder signal. power consumption. In the SST Viterbi decoder, the input signals to the Table 1 Experimental Result branch metric calculator and the path memory circuits are Area(gates) Power(W) all „0‟ unless channel errors occur. That is, the maximum Radix-4 SST % Radix-4 SST % likelihood state of the decoding signal is mainly 18445 19775 -7% 2589.36 1734.84 33% Radix-4: Viterbi decoder using radix-4 trellis distributed around all „0‟ state, therefore, the switching rate of the path memory circuits is drastically reduced SST : SST-based Viterbi decoder using radix-4 trellis because the data passing through the circuits are almost always zero except the received data affected by channel References errors. Therefore, the power consumption of the path memory circuits is reduced compared with that of the [1] S. Kubota, S. Kato, and T. Ishitani, “Novel Vitebi conventional Viterbi decoder. Decoder VLSI Implementation and Its Performance”, SST Viterbi decoder using 4-way ACS IEEE Trans. Commun., vol. 41, no. 8, pp. 1170-1178, re-encoder 1/ Aug. 1993. pre-decoder + [2] P. J. Black and T. H. Meng, “A 140-Mb/s, 32-State, + 1/ 1/ delay-2 Radix-4 Viterbi Decoder”, IEEE J. Solid-State Circuits, + 1/ MSB / / 3/ 3/ vol. 27, no. 12, pp. 1877-1885, Dec. 1992. + 3/ ACS ··· soft-decided data 3 / delay-1 3 / 3 +/ [3] A. J. Viterbi, “Convolutional Codes and Their Branch metric calculator ACS 1/ ··· Performance in Communication Systems”, IEEE Trans. Path memory 2 Commun., vol. COM-19, pp. 751-772, Oct. 1971. 2 · · · / + / · · · decoding [4] H. A. Bustamante, I. Kang, C. Nguyen, and R. E. Peile, output 1/ “Stanford Telecom VLSI of A Convolutional Decoder”, 3/ 3/ 3 soft-decided 3/ delay-1 3 + 3/ + / ACS ··· in Proc., IEEE MILCOM, vol. 1, pp. 171-178, Oct. data / MSB / / 1 / Radix-4 Viterbi decoder 1989. + 1/ 1/ [5] R. Kerr, H. Dehesh, A. B. David, and D. Werner, “A + delay-2 1/ 25 MHz Viterbi FEC Codec”, in Proc, IEEE CICC, pp. pre-decoder + re-encoder 16.6.1-16.6.5, May 1990. Fig. 6 SST Viterbi decoder using radix-4 trellis [6] S. Kubota, K. Ohtani, and S. Kato, “High-speed and High-coding-gain Viterbi Decoder with Low Power Consumption Employing SST (scarce state transition) 6. Conclusion Scheme”, Electron. Lett., vol. 22, Apr. 1986. In this paper, we proposed a new Viterbi decoder with [7] G. Fettweis and H. Meyr, “High-Speed Parallel Viterbi low power consumption employing scarce state transition Decoding Algorithm and VLSI-Architecture”, IEEE (SST) scheme and with high throughput rates exploiting Commun. Mag., vol. 29, no. 5, pp. 46-55, May 1991. radix-4 trellis. As shown in Table 1, our experimental [8] A. P. Hekstra, “An Alternative to Metric Rescaling in result shows 33% reduction in power at a cost of 7% Viterbi Decoders”, IEEE Trans. Commun., vol. 37, no. increase in area and a radix-4 iteration rate of 50MHz, 11, pp. 1220-1222, Nov. 1989.