Document Sample

Vocal Tract Normalization Equals Linear Transformation in Cepstral Space u Michael Pitz, Sirko Molau, Ralf Schl¨ ter, Hermann Ney u Lehrstuhl f¨ r Informatik VI, Computer Science Department, RWTH Aachen – University of Technology, 52056 Aachen, Germany pitz,molau,schlueter,ney @informatik.rwth-aachen.de Abstract In contrast, we will show that there is a general equivalence of VTN frequency warping and a linear transformation of the We show that vocal tract normalization (VTN) frequency warp- cepstral vector, independent of these assumptions. A related re- ing results in a linear transformation in the cepstral domain. For sult has been reported in [6] in the context of spectral distortion the special case of a piece-wise linear warping function, the measures. transformation matrix is analytically calculated. This approach The remainder of the paper is organized as follows: In the enables us to compute the Jacobian determinant of the trans- second paragraph we show that VTN amounts to a linear trans- formation matrix, which allows the normalization of the proba- formation of the acoustic vector. The transformation matrices bility distributions used in speaker-normalization for automatic for the cases of linear and piece-wise linear warping are analyti- speech recognition. cally derived in the third paragraph, followed by some examples obtained by warping a given spectrum with our approach. Then 1. Introduction we discuss the implications for the normalization of probabil- ity distributions when transforming the random variables. The Vocal tract normalization (VTN) tries to compensate for the ef- paper is summarized in section 6. fect of speaker dependent vocal tract lengths by warping the frequency axis of the power spectrum [2, 5, 3, 9, 10]: 2. Cepstral Representation of VTN « ¼ ℄ ¼ ℄ (1) Frequency Warping «´ µ We consider cepstral coefﬁcients deﬁned by: The warping function « is assumed to be invertible, i.e. strictly ½ Ð ¬ ¬ ¬ ´ µ¬ ¬ ¬¾ ¼ Ã monotonic and continuous (see Figure 1). ¾ ½ Ó×´ µ Ð ¬ ¬ ´ µ¬ ¬¾ π ¬ ¬ (2) α>1 ¼ where may either denote the true physical or the Mel fre- quency scale. Note that the conventional deﬁnition of ¼ differs ∼ ω by a factor of 2. α=1 The Ò th cepstral coefﬁcient of the warped spectrum is α<1 Ò ´«µ ½ Ð ´ « ½µ ´ ´ µ µ ¾ ¡ Ó×´ Òµ ¼ ω π In order to obtain the value of the warped power spectrum for a given frequency, we access the unwarped spectrum at the fre- Figure 1: Example of VTN warping functions « for different quency determined by the inverse warping function. This is values of «. necessary as in practice only the discrete unwarped spectrum is given. Explicit spectral interpolation for warping is avoided this way. The relationship between VTN frequency warping and lin- ¬ ´ ½µ ¬¾ ¬ ¬ ear transformations in the cepstral domain has been studied be- Now we expand the spectrum Ð ¬ ´ « ´ µ µ¬ in a fore [1, p.199],[4]. However, these investigations were based Fourier series: on special assumptions: ¯ The VTN frequency warping is restricted to a bilinear ¬ ¬ ¬¾ Ã transformation [1, p.119],[4]. Ð ¬ ´ µ¬ ¬ ¾ Ó×´ µ ¯ The cepstral representation is based on an all-pass or ¼ LPC model [4]. where denotes the -th cepstral coefﬁcient of the unwarped spectrum. Interchanging integration and summation yields: Ã Ò ´«µ ¾ Ó×´ Òµ Ó×´ « ½µ ´ µ µ ´ We choose the inﬂexion point ¼ where the slope of the warping ¼ function changes as follows: ¼ Ã ¾ Ó×´ Òµ Ó×´ « ½µ ´ µ µ ´ « ½ ¼ ¼ ¼ ¡« « ½ Ã Ò ´«µ (3) depends solely on «. ¼ Hence, ´ « ¼µ with ´«µ ¾ Ó×´ Òµ Ó×´ ´ ½µ´ µ µ π Ò « ∼ ω0 ¼ Thus, the vector of warped cepstral coefﬁcients is a linear trans- α>1 formation of the original cepstral coefﬁcients with a transfor- mation matrix ´«µ of dimension Æ Ã . In the case of con- ¢ ∼ ω α=1 tinuous spectra there may be no upper limit for Æ and Ã . In practice, however, we work with discrete spectra. Hence, Æ α<1 and Ã will be ﬁnite, but not necessarily have the same value. Choosing a smaller value of Æ results in a smoothing of the power spectrum and eliminates the pitch. ω ω0 π 3. Analytic Calculation of the Transformation Matrix Figure 2: Piece-wise linear warping functions for different val- 3.1. Linear Warping Function ues of « In order to apply a piece-wise linear warping, we ﬁrst compute the solution for a strictly linear warping function: The transformation matrix Ò ´«µ is computed similar to the « «¡ linear case: ´ ½µ « ½ ¡ ¼ ½ « ¾ · ¼ The entries Ò ´«µ of the transformation matrix can be com- Ò ´« ¼ µ Ó×´ Òµ Ó×´« ½ µ puted by elementary integration. For « ½ we obtain: ¼ ¼ ´«µ ¾ Ó×´ Òµ Ó×´« ½ µ «¡ Ò with ¼ ¼. ½ ¼ Noting that the solution for « remains the same as ½ Ó×´ Ò · « ½ µ · Ó×´ Ò « ½ µ ¡ in the linear case, we obtain for « ½: ½ · ½ ¼ × Ò ´Ò · « ½ µ ℄ · × Ò ´Ò « ½ µ ℄ Ò ´«µ × Ò´Ò «« ½ µµ ¼ · × Ò´Ò· «« ½ µµ ´Ò ´Ò ¼ ´Ò · « ½ µ ´Ò « ½ µ ½ this simpliﬁes to × Ò´Ò «« µ For « × Ò´Ò ·«« µ ½ ½ ¼ ¼ ´ ¾ Ò ¼ Ò ½ Ò· ½ ¼ ¼ Ò ´½µ ¼ ¼ ÆÒ else (4) because of the orthonormality of the cosine function. Note that the value for Ò ¼ results from our special deﬁnition of This matrix can now be used for VTN alternatively to explicit the zeroth cepstral coefﬁcient ¼ . warping the discrete-frequency power spectrum or the inte- grated approach described in [5]. 3.2. Piece-wise Linear Warping Function 3.3. General Warping Functions To meet the requirement of invertibility, we now consider a piece-wise linear warping function [10, 11] with two parame- We would like to stress again that VTN can always be written ters ´« ¼ µ as shown in Figure 2: as a linear transformation in the cepstral domain independent of the functional form of the invertible warping function (see « ¼ eqn. (3)). The analytic calculation of the transformation matrix « ¼ · « ¼´ « for a non–linear warping function, however, is not as straight- µ ´ ¼µ ¼ ¼ ¼ forward as in the piece-wise linear case presented above. 4. Examples by calculating only the ﬁrst 16 cepstral coefﬁcients and warp In this section we will show some examples of spectra obtained hereafter using a ½ ¢ ½ matrix, we obtain slightly different re- sults. The difference between both methods is shown in Figure by applying the linear transformation to the cepstral vectors. A sample spectrum (Figure 3, « ½ ¼) with Æ ½¾ spectral 6. lines was transformed into Ã ½¾ cepstral coefﬁcients by a discrete cosine transform (DCT): α=1.0 Æ ¾ ½ ¬ α=0.8 Æ µ¬ Ó×´ ¾ ¬ Ò ¬¾ Ò µ Ð ´ ¾ ¬ ¬ Æ Ò ¼ Æ Then the cepstral vector has been transformed into a piece-wise linearly warped (4) cepstral vector of 512 coefﬁcients for warp- ing factors « ¼ and « ½ ¾, respectively. Afterwards, the inverse DCT has been applied to the warped cepstral vec- tor in order to obtain a warped spectrum. This last transfor- mation has been carried out for demonstration only; in practice the warped cepstral vector is used for further processing. A 0 2000 4000 6000 8000 comparison of the warped cepstral coefﬁcients obtained by the frequency [Hz] method presented here with those computed from the spectrum as described in [5] reveals no differences. Figure 4: Example of a smoothed spectrum; the cepstrum was warped with a ½¾ ¢ ½¾ matrix (« ¼ ) and subsequently reduced to 16 coefﬁcients. α=1.0 α=1.0 α=1.2 α=0.8 α=1.2 0 2000 4000 6000 8000 frequency [Hz] Figure 5: Example of a smoothed spectrum; the cepstrum was warped with a ½¾ ¢ ½¾ matrix (« ½ ¾) and subsequently reduced to 16 coefﬁcients. 0 2000 4000 6000 8000 frequency [Hz] α=0.8 Figure 3: Example of warped spectra with warping factors « ¼ and « ½ ¾ . first warp, then smooth first smooth, then warp As an additional example we show the effect of cepstral smoothing in Figures 4 and 5. Again, the spectrum shown in Figure 3 has been transformed into 512 cepstral coefﬁcents and has now been smoothed by transforming back with only the α=1.2 ﬁrst 16 cepstral coefﬁcients (« ½ in Figs. 4, 5). The warped spectra have been obtained by calculating 512 cepstral coefﬁ- cients, transforming them with (4) into 512 warped cepstral co- efﬁcients, and subsequent smoothing by transforming back with only the ﬁrst 16 warped cepstral coefﬁcients. It should be noted 0 2000 4000 6000 8000 that this time we can exactly reproduce the warping obtained frequency [Hz] from [5] only if we ﬁrst compute all 512 cepstral coefﬁcients, warp them using (4), and smooth at this point using only the ﬁrst 16 of the obtained cepstral coefﬁcients. If we ﬁrst smooth Figure 6: Effect of different order of warping and smoothing 5. Speaker Normalization 7. References In speaker normalization the acoustic observation vector is [1] A. Acero, “Acoustical and Environmental Robustness in modiﬁed, whereas speaker adaptation modiﬁes the acoustic Automatic Speech Recognition”, Ph. D. Thesis, Carnegie model parameters. This will cause the probability distribution Mellon University, Pittsburgh, PA, USA, September 1990. to be not properly normalized anymore. To re-normalize the [2] E. Eide, H. Gish, “A Parametric Approach to Vocal transformed distributions, the Jacobian of the transformation Tract Length Normalization,” Proc. Int. Conf. on Acoustic, must be taken into account [4, 7]. Speech and Signal Processing, Vol. 1, pp. 346-349, Atlanta, In VTN the speaker normalization is usually not performed GA, May 1996. as a transformation of the acoustic vectors but by warping the power spectrum during signal analysis instead. Hence, the Ja- [3] L. Lee, R. Rose “Speaker Normalization Using Efﬁcient cobian can hardly be calculated. The warping factor « is usu- Frequency Warping Procedures” Proc. Int. Conf. on Acous- ally determined by a maximum likelihood criterion. If the cor- tic, Speech and Signal Processing, Vol. 1, pp. 353-356, At- rect normalization is neglected, systematic errors in estimating lanta, GA, May 1996. « may occur. [4] J. McDonough, “Speaker Normalization With All- Expressing VTN as a matrix transformation of the acoustic Pass Transforms”, Technical Report No. 28, Center vector ( Ü Ü) enables us to take the Jacobian into account: for Language Speech Prcessing, The Johns Hop- Æ ´Ü ¦µ Æ ´ Ü ¦µ kins University, Baltimore, MD, USA, Sep. 1998 (http://www.clsp.jhu.edu/people/jmcd/postscript/all-pass.ps). Æ ´Ü ½ ½ Ì ¦ ½ µ u [5] S. Molau, M. Pitz, R. Schl¨ ter, H. Ney, “Computing Mel- Frequency Cepstral Coefﬁcients on the Power Spectrum” Ô ½ ÜÔ Proc. Int. Conf. on Acoustic, Speech and Signal Processing, Ø ¾ ½ Ì ¦ ½ Salt Lake City, UT, June 2001, to appear. Ô Ø [6] F. K. Nocerino, L. R. Rabiner and D. H. Klatt, “Com- ÜÔ parative Study of Several Distortion Measures for Speech Ø¾ ¦ Recognition”, Proc. Int. Conf. on Acoustic, Speech and Sig- where in the last step is assumed to be square. The practical nal Processing, pp. 25-28, Atlanta, GA, Apr. 1985. inﬂuence of the Jacobian is subject of current research. A quali- [7] A. Sankar, C.-H. Lee, “A Maximum-Likelihood Approach tative plot showing the dependency of the Jacobian determinant to Stochastic Matching for Robust Speech Recognition”, on the warping factor alpha has been computed numerically for IEEE Trans. on Acoustics, Speech and Signal Processing, piece-wise linear warping (Figure 3). Vol 4, No. 3, May 1996. The dependency of Ø ´«µ on « can be used for a re- ﬁned estimation of « in speaker normalization. [8] L.F. Uebel, P.C. Woodland, “An Investigation into Vocal Tract Length Normalisation”, Proc. 6th Europ. Conf. on Speech Communication and Technology, Vol. 6, pp. 2527- 2530, Budapest, Hungary, Sep. 1999. [9] H. Wakita: “Normalization of Vowels by Vocal Tract Length and its Application to Vowel Identiﬁcation.” IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. −log |det A(α)| ASSP-25, No. 2, pp. 183-192, April 1977. [10] S. Wegmann, D. McAllaster, J. Orloff, B. Peskin, “Speaker Normalization on Conversational Telephone Speech,” Proc. Int. Conf. on Acoustic, Speech and Signal Processing, Vol. 1, pp. 339-341, Atlanta, GA, May 1996. [11] L. Welling, S. Kanthak, H. Ney, “Improved Methods for Vocal Tract Normalization,” Proc. Int. Conf. on Acous- tic, Speech and Signal Processing, Vol. 2, pp. 761–764, Phoenix, AZ, April 1999. 0.8 0.9 1 1.1 1.2 α Figure 7: Plot of ÐÓ Ø ´«µ for piece-wise linear warp- ing as function of «. The scaling of the ordinate is intentionally left out as it depends on the number of cepstral coefﬁcients. 6. Conclusion We have shown that vocal tract normalization can be expressed as a linear transformation of the cepstral vector for arbitrary in- vertible warping functions. For the case of piece-wise linear warping we derived an analytic solution for the transformation matrix. This allows us to re-normalize the probability distribu- tion with the Jacobian of the transformation.

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 9 |

posted: | 11/21/2011 |

language: | English |

pages: | 4 |

OTHER DOCS BY yurtgc548

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.