VIEWS: 17 PAGES: 6 POSTED ON: 6/21/2011
Trellis Vector Residual Quantization Giovanni Motta Bruno Carpentieri Volen Center for Complex Systems Dip. di Informatica ed Applicazioni Brandeis University Universita' di Salerno Waltham, MA 02254, USA 84081 Baronissi (SA), Italy gim@cs.brandeis.edu bc@udsab.dia.unisa.it memory/speed requirements and quality, and ABSTRACT that it is not sensitive to codebook design errors. Vector Quantizers (or VQs) have the property of encoding sources while INTRODUCTION achieving asymptotically the best theoretical performance. Unfortunately, conventional Vector Quantization (or in short VQ) is a optimal VQs require exponentially growing source coding technique that, as Shannon computational and memory resources. proved in his "Source Coding Theorem", has Nevertheless, the quality achieved by the the property of achieving asymptotically the VQs is frequently desirable in applications best theoretical performance. Although where only a limited amount of resources is Shannon's theorem guarantees the existence available. of vector quantizers that give nearly optimal In this paper, we present a sub-optimal performance, the theory provides no vector quantizer that combines in an methods of determining such quantizers and, innovative way, trellis coding and residual as was recently demonstrated by Lin[8], the quantization. Our Trellis Coded Vector design of an optimal VQ is an NP-complete Residual Quantizer (or TCVRQ) is a problem. general-purpose sub-optimal VQ with low In 1980 a vector quantizer code book computational costs and small memory design method (named LBG for the authors) requirement. Despite its good performance, was introduced by Linde, Buzo and Gray[2], TCVRQ permits considerable memory this method designs locally optimal code savings when compared to traditional books that have no natural order or structure. quantizers. As a consequence of the lack of structure, We propose new methods for computing the memory needed for the code book grows quantization levels, and experimentally exponentially and the encoding of every analyze the performances of our TCVRQ in source vector, requires an exhaustive search the case of Linear Prediction speech coding to locate a code word that minimizes a given and still image coding. distortion measure. In the following, we will Our experiments confirm that our TCVRQ refer to this kind of quantizer as Exhaustive is a good compromise between Search Vector Quantizer. K Imposing a structure on the VQ has been y Qi (x i ) suggested to hold down both the memory i 1 and computation costs[6]. A very interesting structure is the so called Cascade or Definition 3: A multistage (or layered) Residual Quantization; combining this graph is a pair G (V, E) with the structure with a trellis graph preserves the following properties: memory saving and allows a "virtual 1. V {v1,v2,...,vn} is a finite set of vertices increase" of the quantization levels. K such that V Vk , Vk V for 1 k K DEFINITIONS k 1 and Vi Vj for i j,1 i, j K ; In this section we give a formal definition of 2. E {(vi ,vj ) :vi Vk ,vj Vk1,1 k K} the proposed system and we present an is a finite set of edges. extension of the LBG algorithm to design The number K is the number of stages. the quantization levels. If a residual vector quantizer is associated to Definition 1: Let x be a random vector in n each node of a multistage graph and each n -dimensional Euclidean space ; an N - layer of this graph is not "fully connected" to level Exhaustive Search Vector Quantizer its successor (as in an Ungerboeck's (or ESVQ) of n is a triple Q (A,Q, P) trellis[4]), clearly a bit saving can be where: achieved. Each vector is fully specified by 1. A {y1,y2,...,yN} is a finite indexed the subset of n called code book, of code vectors yi ; 2. P {S1, S2 ,...,SN} is a partition of n where the equivalence classes (or cells) S j of P satisfy: N n Sj , j 1 Figure 1: A K-stage Trellis Coded Vector Residual Sj Sk for j k ; Quantizer; at each node of the trellis is associated an ESVQ n that encodes the quantization errors of the previous stage. 3. Q : A is a mapping that defines the relationship between the codebook and path on the graph and by the indexes of the partitions as: code words in the quantizers associated to Q(x) yj if and only if Sj . x the nodes along the path. Using, for example, the trellis showed in the Figure 1, Definition 2: A Residual Quantizer consists if each Qi ( j) is a N -level VQ, an output of a finite sequence of ESVQs Q1,Q2 ,...,QK vector is specified giving K 1 bits for the such that Q1 quantizes the input x x1 and path and K log2(N) bits for the code each Qi , 1 i K encodes the error (or words. In the "equivalent" residual residual) xi xi 1 Q(xi 1 ) of the previous quantizer, each stage has 4N levels and quantizer Qi 1, 1 i K . K log2(4N) bits are necessary, so the trellis The output is obtained as the sum of all the configuration allows a "virtual doubling" of code words: the available levels. A formal definition of the TCVRQ is the following: Definition 4: A Trellis Coded Vector Residual Quantizer s a pair T (G,Q) where: 1. G (V, E) is a Trellis[4] multistage graph with V n and K stages; 2. Q (Q ,Q1,...,Qn ) is a finite set of 1 ESVQs, V Q and each Qi Q is associated to the vertex vi V ; 3. The ESVQ Qi encodes the residual of Figure 2: SQNR for TCVRQ and ESVQ compared for different bit-rates and for images in the training and in the Qj if and only if (vi ,v j ) E . test sets. The design of the quantization levels for the complexity of the algorithm. VQs associated to each node of the trellis is The best sequence of residual quantizers is performed in sequence, from stage 1 to stage determined using the Viterbi[1] algorithm. K , using the LBG algorithm on the residuals In this particular framework, Viterbi search generated by the connected nodes. is not optimal in fact it behaves like a greedy This design is not optimal; nevertheless it algorithm. Nevertheless the performance is respects the structure of the quantizer and, not degraded because of the decreasing error for a limited number of stages, achieves introduced by the residual structure. interesting performance. The optimality conditions for the residual IMAGE CODING quantizers stated by Barnes and Frost in [6] Several experiments were made to assess the and in [8], can be adapted to our TCVRQ, performance of our TCVRQ on a natural we do not used them here due to the source. A comparison was made between an excessive ESVQ and a TCVRQ in quantizing 28 gray- levels images commonly used as a reference. These images can be found on Internet at the ftp address: "ftp//links.uwaterloo.ca" in the directory "/pub/BragZone". The training and the test sets were composed respectively by 12 and 16 images 512x512 pixels, 256 gray levels, divided in blocks As is clear from Figure 3, for low bit rates, our quantizer outperforms the package in terms of SQNR. Due to their structure, the tree quantizers use two times the memory of the ESVQ that grows exponentially. Figure 4: The low bit rate speech codec. Our TCVRQ uses only an amount of memory that grows linearly with the vector dimension. LOW BIT RATE SPEECH CODING TCVRQ is a general-purpose VQ, with low computational costs and small memory requirements. This makes it very appealing for low bit-rate speech coding applications. A high quality low bit-rate codec is needed Figure 3: Performance of the TCVRQ compared to the J. when speech signal must be encoded on a Goldschneider's VQ package. narrow-band channel preserving voice intelligibility and speaker identification. We (vectors) of 3x3 and 4x4 pixels. The have evaluated the performances of our measure used for the quantization error was TCVRQ in a low bit-rate Linear Prediction the Signal to Quantization Noise Ratio or based speech codec. SQNR. The implemented low bit-rate speech codec The results are shown in the Figure 2. For (see Figure 4) follows a well-known scheme low bit rates the error of TCVRQ is very due to Atal and Remde[5]. close to the optimum; increasing the bit rate It is a hybrid single-pulse codebook excited (and the number of stages) the performance codec where voice signal, sampled at 8KHz of the TCVRQ decreases due to the with 16 bits per sample, is analyzed by using sequentially optimum design. Linear Prediction. Every voice frame (80 Using the same training and test sets, we samples) is classified as "Voiced" or compared the performance of our TCVRQ "Unvoiced" thresholding the peak of the to Jill Goldschneider's VQ package freely autocorrelation function and, for the voiced available on Internet at the ftp address: frames, the main pitch period is estimated. "ftp://isdl.ee.washington.edu/pub/VQ/cod Every frame is synthesized at 2400 bits per e".This package consists of two different second using a singlepulse or a stochastic kinds of tree quantizers (fixed and variable excitation vector. The best excitation signal rate) and an ESVQ that uses the same code is chosen depending on the book of the tree quantizers. Voiced/Unvoiced classification and evaluating the Squared Error between the "His vicious father has seizures" original and the reconstructed frame. (Fricatives) Our TCVRQ quantizes LP parameters, "The problem with swimming is that you represented in terms of Line Spectrum can drown" (Voiced Fricatives) Frequencies (or LSFs) that change every three frames (i.e. 240 samples). With this The Figure 5 shows the distribution of the kind of codec, the quality of the synthesized Cepstral Distance (or CD) between the signal strongly depends on the quantization original and the quantized parameters; CD of the LP parameters. is a perceptually motivated distortion Quantization was performed using a 10- measure widely used for the speech coding. stages trellis quantizer to encode LSFs with The results were obtained by our TCVRQ a bit-rate of 1.9 - 2.4 bits per LP parameter. when experimenting with the speech files in the test set; the histogram shows that the average value is approximately 1 dB and the percentage of frames with a CD greater than 2 dB is quite small. As it is confirmed from the informal listening tests too, the conditions for a transparent quantization expressed by Paliwan in [9] are satisfied and a transparent quantization can be performed with our TCVRQ using only 1.9 - 2.4 bits per LP parameter. Figure 5: Cepstral Distance obtained quantizing the LP CONCLUSIONS coefficients of the test set with a rate of 2.4 bits per parameter. In this paper, we extended the results stated in [14], giving a formal definition of the The training set was composed by 76 Trellis Coded Vector Residual Quantizer sentences pronounced by different speakers and showing the result of some experiments of both sexes in the most common European in low bit rate speech and still image coding. languages: English, French, Dutch, German, Used for the direct quantization of gray Greek, Spanish, Italian. levels still images our TCVRQ performs The test set was composed of 12 English very close to a (locally) optimal ESVQ and sentences spoken by one male and one outperforms a popular VQ package; for low female speaker. The sentences used are coding rates ranging from 0.3 to 1 bits per phonetically rich and well known as hard to pixel we obtained better results in terms of encode: SQNR, speed and memory required. "Why were you away a year Roy ?" The results obtained using the TCVRQ in (Voiced) an LP based speech codec confirm that the "Nanny may know my meaning" performance is good even in a transform (Nasals) codec and that nearly transparent "Which tea-party did Baker go to ?" quantization can be performed with our (Plosives and Unvoiced Stops) system at a rate of 1.9 - 2.4 bits per LP "The little blanket lies around on the parameter. floor" (Plosives) More experiments are in progress to [9]. K.K. Paliwan and B.S. Atal, "Efficient explore the optimality and the effect of Vector Quantization of LPC Parameters at 24 bits/frame", Proc. of different search algorithms. the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing ACKNOWLEDGMENTS (ICASSP91), 1991. [10]. J. Lin, "Vector Quantization for Image We wish to thank Prof. Martin Cohn for the Compression: Algorithms and fruitful discussions and the patient review of Performance", Ph.D. Dissertation, the manuscript. Brandeis University, 1992. [11]. H. S. Wang and N. Moayeri, "Trellis REFERENCES Coded Vector Quantization", IEEE Trans. on Comm., Vol.28, N.8, 1992. [1]. A.J. Viterbi and J.K. Omura, "Trellis Encoding of a Memoryless Discrete- [12]. T.R. Fischer and M. Wang, "Entropy Time Sources with a Fidelity Constrained Trellis-Coded Criterion", IEEE Trans. on Inform. Quantization", IEEE Trans. on Inform. Theory, pp.325-332, May 1974. Theory, Vol.38, pp.415-425, 1992. [2]. Y. Linde and A. Buzo and R.M. Gray, [13]. R. Laroia and N. Farvadin, "Trellis- "An Algorithm for Vector based scalar-vector quantizers for Quantization Design", IEEE Trans. on memoryless sources", IEEE Trans. on Comm., Vol.28, pp.84-95, Jan. 1980. Inform. Theory, Vol.40, No.3, 1994. [3]. L. Colm Stewart, "Trellis Data [14]. B. Belzer and J.D. Villasenor, Compression", Xerox, Palo Alto "Symmetric Trellis Coded Vector Research Center, 1981. Quantization", Proc. of the IEEE Data Compression Conference (DCC96), [4]. G. Ungerboeck, "Channel Coding with 1996. Multilevel/Phase Signals", IEEE Trans. on Inform. Theory, Vol.28, N.1, [15]. Motta and B. Carpentieri, "A New Jan. 1982. Trellis Vector Residual Quantizer: Applications to Image Coding", Proc. [5]. B.S. Atal and J.R. Remde, "A New of the IEEE Int. Conf. on Acoustics, Model of LPC Excitation for Speech, and Signal Processing Producing Natural-Sounding Speech (ICASSP97), Apr. 1997. at Low Bit Rates", Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP82), 1982. [6]. C.F. Barnes, "Residual Vector Quantizers", Ph.D. Dissertation, Brigham Young University, 1989. [7]. T.R. Fischer, M.W. Marcellin, and M. Wang, "Trellis Coded Vector Quantization", IEEE Trans. on Inform. Theory, Vol.37, N.3, pp.1551-1566, 1991. [8]. R.L. Frost, C.F. Barnes, and F. Xu, "Design and Performance of Residual Quantizers", Proc. of the IEEE Data Compression Conference (DCC91), 1991.