Learning Center
Plans & pricing Sign in
Sign Out

Trellis Vector Residual Quantization


									                Trellis Vector Residual Quantization

              Giovanni Motta                             Bruno Carpentieri
      Volen Center for Complex Systems             Dip. di Informatica ed Applicazioni
             Brandeis University                          Universita' di Salerno
         Waltham, MA 02254, USA                        84081 Baronissi (SA), Italy              

                                                memory/speed requirements and quality, and
               ABSTRACT                         that it is not sensitive to codebook design
Vector Quantizers (or VQs) have the
property of encoding sources while                          INTRODUCTION
achieving asymptotically the best theoretical
performance. Unfortunately, conventional        Vector Quantization (or in short VQ) is a
optimal VQs require exponentially growing       source coding technique that, as Shannon
computational and memory resources.             proved in his "Source Coding Theorem", has
Nevertheless, the quality achieved by the       the property of achieving asymptotically the
VQs is frequently desirable in applications     best theoretical performance. Although
where only a limited amount of resources is     Shannon's theorem guarantees the existence
available.                                      of vector quantizers that give nearly optimal
 In this paper, we present a sub-optimal        performance, the theory provides no
vector quantizer that combines in an            methods of determining such quantizers and,
innovative way, trellis coding and residual     as was recently demonstrated by Lin[8], the
quantization. Our Trellis Coded Vector          design of an optimal VQ is an NP-complete
Residual Quantizer (or TCVRQ) is a              problem.
general-purpose sub-optimal VQ with low          In 1980 a vector quantizer code book
computational costs and small memory            design method (named LBG for the authors)
requirement. Despite its good performance,      was introduced by Linde, Buzo and Gray[2],
TCVRQ permits considerable memory               this method designs locally optimal code
savings when compared to traditional            books that have no natural order or structure.
quantizers.                                     As a consequence of the lack of structure,
 We propose new methods for computing           the memory needed for the code book grows
quantization levels, and experimentally         exponentially and the encoding of every
analyze the performances of our TCVRQ in        source vector, requires an exhaustive search
the case of Linear Prediction speech coding     to locate a code word that minimizes a given
and still image coding.                         distortion measure. In the following, we will
 Our experiments confirm that our TCVRQ         refer to this kind of quantizer as Exhaustive
is    a     good    compromise      between     Search Vector Quantizer.
 Imposing a structure on the VQ has been
                                                                        y   Qi (x i )
suggested to hold down both the memory                                         i 1
and computation costs[6]. A very interesting
structure is the so called Cascade or              Definition 3: A multistage (or layered)
Residual Quantization; combining this              graph is a pair G  (V, E) with the
structure with a trellis graph preserves the       following properties:
memory saving and allows a "virtual                1. V  {v1,v2,...,vn} is a finite set of vertices
increase" of the quantization levels.                                      K
                                                        such that V             Vk , Vk  V for 1  k  K
               DEFINITIONS                                                k 1
                                                      and Vi Vj   for i  j,1 i, j  K ;
In this section we give a formal definition of     2. E  {(vi ,vj ) :vi Vk ,vj Vk1,1 k  K}
the proposed system and we present an
                                                      is a finite set of edges.
extension of the LBG algorithm to design
                                                   The number K is the number of stages.
the quantization levels.
                                                   If a residual vector quantizer is associated to
Definition 1: Let x be a random vector in
                                    n              each node of a multistage graph and each
n -dimensional Euclidean space  ; an N -          layer of this graph is not "fully connected" to
level Exhaustive Search Vector Quantizer
                                                   its successor (as in an Ungerboeck's
(or ESVQ) of  n is a triple Q  (A,Q, P)          trellis[4]), clearly a bit saving can be
where:                                             achieved. Each vector is fully specified by
1. A  {y1,y2,...,yN} is a finite indexed          the
subset of  n called code book, of code
vectors yi ;
2. P  {S1, S2 ,...,SN} is a partition of  n
where the equivalence classes (or cells) S j
of P satisfy:
                         Sj   ,
                  j 1
                                                   Figure 1: A K-stage Trellis Coded Vector Residual
             Sj Sk   for j  k ;                 Quantizer; at each node of the trellis is associated an ESVQ
          n                                        that encodes the quantization errors of the previous stage.
3. Q :        A is a mapping that defines
the relationship between the codebook and          path on the graph and by the indexes of the
partitions as:                                     code words in the quantizers associated to
        Q(x)  yj if and only if Sj .
                               x                   the nodes along the path. Using, for
                                                   example, the trellis showed in the Figure 1,
Definition 2: A Residual Quantizer consists        if each Qi ( j) is a N -level VQ, an output
of a finite sequence of ESVQs Q1,Q2 ,...,QK        vector is specified giving K  1 bits for the
such that Q1 quantizes the input x  x1 and        path and K log2(N) bits for the code
each Qi , 1 i  K encodes the error (or           words. In the "equivalent" residual
residual) xi  xi 1  Q(xi 1 ) of the previous   quantizer, each stage has 4N levels and
quantizer Qi 1, 1 i  K .                         K log2(4N) bits are necessary, so the trellis
The output is obtained as the sum of all the       configuration allows a "virtual doubling" of
code words:                                        the available levels.
A formal definition of the TCVRQ is the

Definition 4: A Trellis Coded Vector
Residual Quantizer s a pair T  (G,Q)
1. G  (V, E) is a Trellis[4] multistage
   graph with V  n and K stages;
2. Q  (Q ,Q1,...,Qn ) is a finite set of
   ESVQs, V  Q and each Qi Q is
   associated to the vertex vi V ;
3. The ESVQ Qi encodes the residual of          Figure 2: SQNR for TCVRQ and ESVQ compared for
                                                different bit-rates and for images in the training and in the
    Qj if and only if (vi ,v j ) E .           test sets.

The design of the quantization levels for the   complexity of the algorithm.
VQs associated to each node of the trellis is    The best sequence of residual quantizers is
performed in sequence, from stage 1 to stage    determined using the Viterbi[1] algorithm.
 K , using the LBG algorithm on the residuals    In this particular framework, Viterbi search
generated by the connected nodes.               is not optimal in fact it behaves like a greedy
 This design is not optimal; nevertheless it    algorithm. Nevertheless the performance is
respects the structure of the quantizer and,    not degraded because of the decreasing error
for a limited number of stages, achieves        introduced by the residual structure.
interesting performance.
 The optimality conditions for the residual                      IMAGE CODING
quantizers stated by Barnes and Frost in [6]
                                                Several experiments were made to assess the
and in [8], can be adapted to our TCVRQ,
                                                performance of our TCVRQ on a natural
we do not used them here due to the
                                                source. A comparison was made between an
                                                ESVQ and a TCVRQ in quantizing 28 gray-
                                                levels images commonly used as a reference.
                                                These images can be found on Internet at the
                                                ftp address: "ftp//" in
                                                the directory "/pub/BragZone".
                                                 The training and the test sets were
                                                composed respectively by 12 and 16 images
                                                512x512 pixels, 256 gray levels, divided in
                                                         As is clear from Figure 3, for low bit rates,
                                                        our quantizer outperforms the package in
                                                        terms of SQNR. Due to their structure, the
                                                        tree quantizers use two times the memory of
                                                        the ESVQ that grows exponentially.

                                                        Figure 4: The low bit rate speech codec.

                                                         Our TCVRQ uses only an amount of
                                                        memory that grows linearly with the vector

                                                           LOW BIT RATE SPEECH CODING
                                                        TCVRQ is a general-purpose VQ, with low
                                                        computational costs and small memory
                                                        requirements. This makes it very appealing
                                                        for low bit-rate speech coding applications.
                                                        A high quality low bit-rate codec is needed
Figure 3: Performance of the TCVRQ compared to the J.   when speech signal must be encoded on a
Goldschneider's VQ package.                             narrow-band channel preserving voice
                                                        intelligibility and speaker identification. We
(vectors) of 3x3 and 4x4 pixels. The                    have evaluated the performances of our
measure used for the quantization error was             TCVRQ in a low bit-rate Linear Prediction
the Signal to Quantization Noise Ratio or               based speech codec.
SQNR.                                                    The implemented low bit-rate speech codec
 The results are shown in the Figure 2. For             (see Figure 4) follows a well-known scheme
low bit rates the error of TCVRQ is very                due to Atal and Remde[5].
close to the optimum; increasing the bit rate            It is a hybrid single-pulse codebook excited
(and the number of stages) the performance              codec where voice signal, sampled at 8KHz
of the TCVRQ decreases due to the                       with 16 bits per sample, is analyzed by using
sequentially optimum design.                            Linear Prediction. Every voice frame (80
Using the same training and test sets, we               samples) is classified as "Voiced" or
compared the performance of our TCVRQ                   "Unvoiced" thresholding the peak of the
to Jill Goldschneider's VQ package freely               autocorrelation function and, for the voiced
available on Internet at the ftp address:               frames, the main pitch period is estimated.
"                 Every frame is synthesized at 2400 bits per
e".This package consists of two different
                                                        second using a singlepulse or a stochastic
kinds of tree quantizers (fixed and variable            excitation vector. The best excitation signal
rate) and an ESVQ that uses the same code               is      chosen       depending       on     the
book of the tree quantizers.                            Voiced/Unvoiced         classification      and
evaluating the Squared Error between the                      "His vicious father has seizures"
original and the reconstructed frame.                          (Fricatives)
 Our TCVRQ quantizes LP parameters,                           "The problem with swimming is that you
represented in terms of Line Spectrum                          can drown" (Voiced Fricatives)
Frequencies (or LSFs) that change every
three frames (i.e. 240 samples). With this                  The Figure 5 shows the distribution of the
kind of codec, the quality of the synthesized              Cepstral Distance (or CD) between the
signal strongly depends on the quantization                original and the quantized parameters; CD
of the LP parameters.                                      is a perceptually motivated distortion
 Quantization was performed using a 10-                    measure widely used for the speech coding.
stages trellis quantizer to encode LSFs with               The results were obtained by our TCVRQ
a bit-rate of 1.9 - 2.4 bits per LP parameter.             when experimenting with the speech files in
                                                           the test set; the histogram shows that the
                                                           average value is approximately 1 dB and the
                                                           percentage of frames with a CD greater than
                                                           2 dB is quite small.
                                                            As it is confirmed from the informal
                                                           listening tests too, the conditions for a
                                                           transparent quantization expressed by
                                                           Paliwan in [9] are satisfied and a transparent
                                                           quantization can be performed with our
                                                           TCVRQ using only 1.9 - 2.4 bits per LP

Figure 5: Cepstral Distance obtained quantizing the LP                  CONCLUSIONS
coefficients of the test set with a rate of 2.4 bits per
parameter.                                                 In this paper, we extended the results stated
                                                           in [14], giving a formal definition of the
 The training set was composed by 76                       Trellis Coded Vector Residual Quantizer
sentences pronounced by different speakers                 and showing the result of some experiments
of both sexes in the most common European                  in low bit rate speech and still image coding.
languages: English, French, Dutch, German,                  Used for the direct quantization of gray
Greek, Spanish, Italian.                                   levels still images our TCVRQ performs
 The test set was composed of 12 English                   very close to a (locally) optimal ESVQ and
sentences spoken by one male and one                       outperforms a popular VQ package; for low
female speaker. The sentences used are                     coding rates ranging from 0.3 to 1 bits per
phonetically rich and well known as hard to                pixel we obtained better results in terms of
encode:                                                    SQNR, speed and memory required.
 "Why were you away a year Roy ?"                          The results obtained using the TCVRQ in
    (Voiced)                                               an LP based speech codec confirm that the
 "Nanny may know my meaning"                              performance is good even in a transform
    (Nasals)                                               codec and that nearly transparent
 "Which tea-party did Baker go to ?"                      quantization can be performed with our
    (Plosives and Unvoiced Stops)                          system at a rate of 1.9 - 2.4 bits per LP
 "The little blanket lies around on the                   parameter.
    floor" (Plosives)
 More experiments are in progress to             [9]. K.K. Paliwan and B.S. Atal, "Efficient
explore the optimality and the effect of              Vector    Quantization     of    LPC
                                                      Parameters at 24 bits/frame", Proc. of
different search algorithms.                          the IEEE Int. Conf. on Acoustics,
                                                      Speech, and Signal Processing
        ACKNOWLEDGMENTS                               (ICASSP91), 1991.
                                                 [10]. J. Lin, "Vector Quantization for Image
We wish to thank Prof. Martin Cohn for the             Compression:      Algorithms       and
fruitful discussions and the patient review of         Performance", Ph.D. Dissertation,
the manuscript.                                        Brandeis University, 1992.
                                                 [11]. H. S. Wang and N. Moayeri, "Trellis
              REFERENCES                               Coded Vector Quantization", IEEE
                                                       Trans. on Comm., Vol.28, N.8, 1992.
[1]. A.J. Viterbi and J.K. Omura, "Trellis
     Encoding of a Memoryless Discrete-          [12]. T.R. Fischer and M. Wang, "Entropy
     Time Sources with a Fidelity                      Constrained             Trellis-Coded
     Criterion", IEEE Trans. on Inform.                Quantization", IEEE Trans. on Inform.
     Theory, pp.325-332, May 1974.                     Theory, Vol.38, pp.415-425, 1992.
[2]. Y. Linde and A. Buzo and R.M. Gray,         [13]. R. Laroia and N. Farvadin, "Trellis-
     "An     Algorithm      for     Vector             based scalar-vector quantizers for
     Quantization Design", IEEE Trans. on              memoryless sources", IEEE Trans. on
     Comm., Vol.28, pp.84-95, Jan. 1980.               Inform. Theory, Vol.40, No.3, 1994.
[3]. L. Colm Stewart, "Trellis Data              [14]. B. Belzer and J.D. Villasenor,
     Compression", Xerox, Palo Alto                    "Symmetric Trellis Coded Vector
     Research Center, 1981.                            Quantization", Proc. of the IEEE Data
                                                       Compression Conference (DCC96),
[4]. G. Ungerboeck, "Channel Coding with               1996.
     Multilevel/Phase    Signals",   IEEE
     Trans. on Inform. Theory, Vol.28, N.1,      [15]. Motta and B. Carpentieri, "A New
     Jan. 1982.                                        Trellis Vector Residual Quantizer:
                                                       Applications to Image Coding", Proc.
[5]. B.S. Atal and J.R. Remde, "A New                  of the IEEE Int. Conf. on Acoustics,
     Model of LPC Excitation for                       Speech, and Signal Processing
     Producing Natural-Sounding Speech                 (ICASSP97), Apr. 1997.
     at Low Bit Rates", Proc. of the IEEE
     Int. Conf. on Acoustics, Speech, and
     Signal Processing (ICASSP82), 1982.
[6]. C.F. Barnes, "Residual Vector
     Quantizers",  Ph.D.    Dissertation,
     Brigham Young University, 1989.
[7]. T.R. Fischer, M.W. Marcellin, and M.
     Wang,     "Trellis  Coded      Vector
     Quantization", IEEE Trans. on Inform.
     Theory, Vol.37, N.3, pp.1551-1566,
[8]. R.L. Frost, C.F. Barnes, and F. Xu,
     "Design and Performance of Residual
     Quantizers", Proc. of the IEEE Data
     Compression Conference (DCC91),

To top