VIEWS: 31 PAGES: 4 POSTED ON: 1/28/2011
AUDITORY FILTER BANK INVERSION L. Lin, W.H. Holines and E. Anibikairajah School of Electrical Engineering and Telecoinmunications The University of New South Wales, UNSW SYDNEY NSW 2052, Australia ll.lin@ee.unsw.edu.au, H.Holnies@unsw.edu.au, Ambi@ee.unsw.edu.au ABSTRACT filter, and a, b E 3 are constants. At moderate power levels [2], ERB(f,) = 24.7 + 0. lOSf, . Gammatone filters were first used Models of auditory filtering using the Gaininatone filter bank are by Flanagan [I] to model basilar membrane motion, and useful tools in speech processing. A perceptually accurate subsequently investigated by many others. auditory inversion model has applications in speech and audio coding. This paper proposes a new auditory filter bank inversion A possible discrete Gammatone filter (order N = 4) is given by method using a least squares optimization technique. The (:+l)"(boz'+b,z3 + b 2 2 +b,z+b,) proposed method is computationally efficient and its low delay G ( z )= , (2.2) makes it suitable for frame-by-frame processing. Three other ( 2+ a , z + a , ) J approaches to Gaininatone analysisisynthesis filter bank where the numerator and denominator coefficients bi and ai are implementations are compared with the proposed method. determined from (2.1) using an appropriate transformation (e.g. the bilinear transform, as in this paper). The implementation M 1. INTRODUCTION Gaininatone filters as an analysisisynthesis filter bank system is shown in Figure 2.1. The analysis filter bank consists of M=25 There has been considerable research devoted to nlodelling the Gammatone filters at different centre frequencies. The synthesis functional roles of peripheral auditory systems. Although filter in each channel employs the time-reversed impulse computational auditory niodels have been shown in some cases response of the analysis filter [4]. This niethod is based on the to outperfonii conventional signal processing techniques, following property of a Gaininatone filter bank with the ERB especially in noisy environments, adequate modelling of the scale: principal behaviour of the peripheral auditory systems is still a M difficult problem. 1 G,,,(e'O ) 13 = c r,r=l Recently, quantitative models of the auditory system and where c E 3 is a constant and M is the number of filters. The auditory processing of speech signal have received a lot of overall analysisisynthesis system approximates an all-pass filter. attention [3] [7]. Many models in the past used a cascade filter bank in order to achieve the measured mechanical tuning of the ADllYiS ,111111 SWlhcrrS r,I ,I b filler I I : fibCrdl basilar membrane. However, in recent models, auditory filtering is achieved using Gammatone or other parallel filter banks [5] .. 1,111) [6]. A fiindaniental requirement for coding applications is an Alld?¶iS filter d Z Spirlicsis filtcrli2 efficient mechanism for decoinposing and resynthesising the - speech signal. Kubin and KIeijii [4] used a Ganimatone analysis - .v(n) filter bank with a synthesis filter bank whose iinpulse responses And?¶<, filters,,, are the time-reversed impulse responses of the analysis filters, which requires a time delay of at least 20 ins to make the system causal. This paper proposes a computationally efficient auditory filter bank inversion method with low delay and other advantages. Section 2 presents the overall analysis/syiithesis scheme for the Figure 2.1. Gaminatone analysisisynthesis filter bank Gainmatone filter bank, and three approaches to Gaminatone analysisisynthesis filter bank implenientation are presented in 3. IMPLEMENTATION OF GAMMATONE Section 3 . A new auditory filter bank inversion approach using a FILTER BANK least square optimization technique i s presented in section 4. In this section, three types of Gammatone analysis/synthesis filter 2. THE GAMMATONE FILTER BANK bank inipleinentation schemes will be discussed: FIRFIR, IIWFIR and IIWIIR. A Gammatone is the product of a rising polynomial, a decaying exponential fiuiction, and a cosine wave. Its impulse response is: 3.1 FI€UFIRFilter Bank Implementation In this case both analysis and synthesis filters are FIR. The where N is the order of the filter, J;. i s the centre frequency, lengths of the FIR filters should be sufficient to approxiinate the ERB(1;.)is the equivalent rectangular bandwidth of the auditory 11-537 0-7803-6685-910 1/$10.000200 1 IEEE Gammatone impulse response [4] [7]. There are two inherent disadvantages of the FIWFIR inversion. Due to the infinite length of the Gainmatone impulse response, long FIR filters are necessary for accurate reconstruction. For example, the length of a FIR Gammatone filter at a centre frequency of 5 O O H z should be at least 160 samples if the signal is sampled at 8 kHz. Hence this method is coniputationally expensive. Figure 3.1. Synthesis of rhe jth frame at the nzth channel Because the synthesis filters are time-reversed impulse of an IIWIIR Gainmatone filter bank (TR: time reversal) responses of the analysis filters, a large delay up to 20ms must be introduced to make the FIR synthesis Although the analysis/synthesis schemes introduced in this filters causal. This may be excessive for frame-based section can simplify the computation to some extent, a delay of processing. approximately one frame is unavoidable. In Section 4, we will show that by applying a novel least squares optimization 3.2 IIR/FIR Filter Bank Implementation technique a FIR synthesis filter bank can be designed that has low delay, low order and any given accuracy of reconstruction. Altematively, the Gammatone filter given in (2.2) can be implemented using IIR filters [4] [ 7 ] . However, the synthesis 4. DESIGN OF SYNTHESIS FILTER filters are still FIR as obtained from the time-reversed impulse response of the analysis filters. The delay is still inevitable but BANK BY OPTIMIZATION the computational load is reduced to nearly half that of the Perhaps the simplest, and apparently ideal, choice for the FIWFIR scheme. synthesis filters is 3.3 IIRmR Filter Bank Implementation Another method of inversion is to realize both analysis and where G,,,(:) is the z-transform of the nzth analysis filter g,,,(n)and synthesis filter banks as IIR filters. The synthesis filters are H,,,(z)is the z-transform of the ruth synthesis filter h,,,(n). exactly the IIR analysis filters, but their inputs are time reversed. At the output of the synthesis filters, the signal must be time For non-niinimuni phase Gammatone filters (2.2), the zeros reversed again to achieve linear phase for each channel. This outside the unit circle of analysis filter will become unstable technique has been successfully applied to speech enhancement poles of the corresponding synthesis filter. Replacing the and noise suppression by Irino [ 7 ] . Unfortunately, when applied analysis filters by minimum phase Gainmatone filters can solve to continuous speech processing, samples from future frames are this problem. Minimum-phase filters may be obtained using needed as the initial inputs of the IIR synthesis filters. This spectral decomposition method. The new analysis filters have requires a variation of the usual block processing method, as the same magnitude frequency responses as the old ones. Then follows. the synthesis filter in (4.1 ) will be stable and exact reconstruction can be achieved, assuming perfkt arithmetic. A study of the impulse responses of Gammatone filters showed that the longest impulse response within the frequency range of However, this method will not perform well in actual coding interest takes about 20 nis to decay almost to zero. This suggests applications. Quantization noise (Figure 4.1) will be emphasized . that we can apply time reversal to blocks that consist of adjacent at frequencies where IGJ,,(z)Iis small. Any signal processing frames. At least one frame in the block will be used to forget the between the analysis and synthesis banks will also lead to errors effect ,of the unknown initial condition of the filters. Longer at the same frequencies. For white noise with power 0 : one , in block length (more frames in a block) will be more effective to due with the initial condition problem. However longer delay will channel, the noise power at the output of that channel becomes also be introduced. The synthesis of one channel is depicted in otx+h$k) and is scaled by a factor of C k = o ~ t,; ( k ), N w~iic~i figure 3.1. - inay be quite large. The output from the nlth analysis filter is represented as y,,,(n). Two adjacent frames of the output data from the nzth analysis We improve this method by designing the synthesis filters using filter fomi one block. The whole block, which consists of frame spiking filter techniques. Since the final goal is to reconstruct the J-1 and frame J, is time reversed. Then the data of this time- original signal from the output of the analysis filter bank, the reversed block is filtered through the Gainmatone IIR synthesis impulse response of the overall analysis/synthesis filter bank filter. Because the initial condition of the IIR filter is unknown, system should resemble an impulse. Ideally we sliould have it is assumed to be zero. Hence the first frame of the output from x(n)=x(rr) if C ~ , g e , ( n ) : ~ h , , , ( n ) = 6 ( n ) . factors are Two the IIR synthesis filter will be corrupted and should be discarded. Only the second frame (the shaded frame) will be useful signal considered in the optimization design of the synthesis filters: and should be kept. The second frame is then time-reversed and ) ( I ) Minimization ofthe noise power gain x L h j : r ( k; patched together with the sanie frame from other blocks to form the output of the nrth channel. The delay will be one frame. The performance is the same as with the IIWFIR method. XI-538 (2) Minimization of the difference between ~ ~ l g l l l*( n ) h,,,(n) U ; -= 2h-2AGT (A-Gh) = 0 , and 6(n-L). A delay of L samples is introduced to possibly ah improve the performance. so that the optimum parameters can be obtained as Synthesis filter design can be carried out on the individual channels. However, design over multiple channels provides h=(G'G+!)-'G'A 9 (4.4) improved performance. A where I is a unit matrix of appropriate size. This optimized design is tested on a 25-channel Gammatone filter bank (M=25). The analysis filter bank consists of the same IIR Gammatone filters ( N = 4 ) as in Section 2. The synthesis filter bank is obtained using the above optimization filter design technique. Figure 4.1. Analysis and synthesis in the nI"' channel The order of each synthesis filter is N , = 20. The reconstruction, delay is chosen as L = I O and the Lagrange multiplier A=200. To fomiulate the solution in a compact form, we introduce the These values resulted in a distortion D of 0.1 1. In Figure 4.2, the following notation: dashed lines are the frequency responses of analysis filters and the dotted-dashed lines are those of the synthesis filters (Only G=[Gl G2 ... GI,, GJ channel 6, 12 and 18 are shown). Most of the synthesis filters have low-pass characteristics. The flat solid line is the inversion, h=[hT h; h, *.- :- e - GIP or the frequency response of the overall impulse response of the A=[O 0 -.. 0 1 0 . e - 0 Or analysisisynthesis filter bank. The convolution of the analysis and synthesis filter of each channel gives a bandpass response. The superposition of these bandpass filters from all channels where h =[h,,,(O) h,,,(1) ... h J N s ) r is the impulse , resembles an allpass filter with a flat magnitude response. Hence response of the nil' FIR synthesis filter to be designed, N, + 1 is the impulse response of the overall analysis/synthesis system is a the length of the filter. G,, is the convolution matrix of the nifh wavelet that resembles a smoothed impulse at a delayed time channel analysis filter defined by instant L = IO. A speech signal is decomposed using tlie analysis filter bank. Then 8-bit quantization is carried out on tlie output of the analysis filters prior to the reconstruction by the synthesis filter bank. The reconstructed signal is delayed by only I O samples and is perceptually indistinguishable from the original speech. G Analyam and S Y n t h e S i a by OPtimiZation tn~+?rsion 20 -- l"UQrs10" '0 -. Analysis Synthesis where N , is tlie length of the impulse response of tlie nlfh analysis filter. The position of element "I" in the vector A depends on the delay L required. We also define d=A-G,hl - G ~ h ~ . . - - G , , ~ , , , . . . - G ~ h i M =A-Gh, -60 1. -70 1 which represents the difference between the actual impulse - , 0 500 1000 1500 2000 2500 3000 3500 4000 response of the analydsynthesis system and an exact impulse. frequency cnz, Then we minimize I/ h /I2, the noise of the complete filter bank, Figure 4.2. Frequency responses of tlie analysisisynthesis subject to the constraint that the impulse response error (or filter bank (Only 3 synthesis filters are shown) distortion) Ildll' does not exceed some nominated value D. The choice of the Lagrange multiplier A, the order of the This is achieved by using the Lagrange multiplier method to syllthesis filters N , and the delay will influence the minimize the quantity performance of the filter bank significantly. These variables are discussed in the following sub-sections. J =h*h+ A(d7d-D) (4.2) where A is a Lagrange multiplier and 4.1 The Lagrange Multiplier d7d=(A-Gh)7(A-Gh). (4.3) The choice of A is decided by the trade-off between the smoothing effect of the filters and the accuracy of reconstruction. The optimum filter coefficients must satisfy The value of D i n (4.3) reflects the difference of the overall analysisisynthesis system from an exact impulse. The relation 11-539 between D and A for three different filter orders (N, = 8, 16 and Elfesl o f Delay o n the v a l v e 0 1 D. Filter order. 8 16 2 4 , l a m b d a - 1 0 0 0 24) is shown in Figure 4.3. The delay is chosen as L=lO in each 1 example. A large A will have better reconstruction accuracy but poorer noise suppression ability. A value of D around 0.1 results 0 8 O 9 t in good noise suppression and reconstructed speech that is perceptually indistinguishable from the original. For a filter orde; of 20, A. is usually chosen in the range of 100 to 2000, depending on the strength of the noise. 4.2 Order of FIR filters N, -” .~ I The value of D decreases with increasing filter order N,. This is 0 10 20 30 Lag 40 50 shown in Figure 4.4 for three different A values (A = IO, 100 and Figure 4.5. Distortion D as a function of delay L 1000) and a delay of L= 10. For filter orders above about 20, the change in D is not obvious hence large filter orders are not 5. SlJMMARY required for the synthesis filters. Reconstruction accuracy can be achieved more effectively with an increase in A and the correct Problems of delay, accuracy and computational complexity in choice of delay L. frame-by-frame auditory filter bank inversion techniques have been addressed. A novel optimization method was proposed in 4.3 Delay L section 4 that can produce FIR filters with around 20 coefficients and delays of around 1.5 ms. This is a substantial improvement The amount of delay L in the reconstruction, or the position of over existing techniques described in section 3, which use filters element “1” in the vector A , plays a very important role in the of order 160 or more and delays of 20 ms or more in order to performance of the filter bank. The dependence of D on the perfomi accurate inversion. Further, the new optimization delay L for three different filter orders with non-minimum phase technique produces filters whose responses are optimum for a analysis filters is plotted in Figure 4.5, where it can be seen that given distortion limit, and which are relatively insensitive to there is a broad optimum choice of L (near N,), but the exact quantization or processing noise. Thus, this technique is highly choice is not critical. If the analysis filter bank is minimum- suitable for the design of auditory filter banks for frame-by-frame phase Gammatone filters, the optimum delay can be zero. The speech and audio processing ,applications such as speech coding. value of A in these examples is 1000 with N, chosen as 8, 16 and Although we used the Gammatone model in our application, the 24. optimization method also suit:; other types of auditory models. Effect 111 lambda on I h e v a l u e o f D Filter O r d e r 8 1 8 24 D e l l y 10 r a m p b r r I 6 . REFERENCES Flanagan, J.L., “Models for approximating basilar membrane displacement”, Bell Svs. Tech. J , 1960, vol. 39, pp. 1163-1 191. oo’:y=/ Glasberg, B.R. and Moore, B. C. J., “Derivation of auditory filter shape from notched-noise data”, Hear.. Res., 1990, vol. lilter order 8 47, pp. 103-138. ,,,,er o r d e r 2 4 ,,,fer o r d e r 1 6 Kim, D., Lee, S., and Kil, R., “Auditory processing of speech signals for robust speech recognition in real-world 0 05 noisy environments”, IEEE Trans. on Speech and Audio 0 500 1000 l300 2000 2500 3000 3500 4000 LAMBDA Processing, 1999, vol. 7, pp. 55-69. Figure 4.3. Distortion D as a fiinction of A Kubin, G. and Kleijn, W.B., “On speech coding in a perceptual domain”, Proc. ZCASSP (Phoenix, USA), 1999, Elf..toflilf.rOldrron the v a l v c o f D D e l a y 10 l a m b d a 10 1 0 0 1 0 0 0 pp. 205-208. Lyon, R.F., “The all-pole models of auditory filtering”, in Diversih. in Aiiditon, Mechanics, World Scientific Publishing, Singapore, 1997, pp. 205-21 1. Slaney, M., “An efficient Implementation of the Patterson- Holdsworth auditory filter bank”, .4pp/e Conipzrter ’ Technical Report #35, Apple Computer, Inc., 1993. Irino, T and Unoki, M, “A time-varying, analysis/synthesis auditory filter bank using the Gammachirp”, Proc. ICASSP (Seattle, USA), 1998, pp. 3653-3656. f,ll*, 0d.r Figure 4.4. Distortion D as a function of filter order N , 11-540