Auditory filter bank inversion_2001 by ashrafp

VIEWS: 31 PAGES: 4

									                                   AUDITORY FILTER BANK INVERSION
                                      L. Lin, W.H. Holines and E. Anibikairajah
                             School of Electrical Engineering and Telecoinmunications
                     The University of New South Wales, UNSW SYDNEY NSW 2052, Australia
                             ll.lin@ee.unsw.edu.au, H.Holnies@unsw.edu.au, Ambi@ee.unsw.edu.au


                        ABSTRACT                                          filter, and a, b E 3 are constants. At moderate power levels [2],
                                                                           ERB(f,) = 24.7 + 0. lOSf, . Gammatone filters were first used
Models of auditory filtering using the Gaininatone filter bank are
                                                                          by Flanagan [I] to model basilar membrane motion, and
useful tools in speech processing. A perceptually accurate
                                                                          subsequently investigated by many others.
auditory inversion model has applications in speech and audio
coding. This paper proposes a new auditory filter bank inversion          A possible discrete Gammatone filter (order N = 4) is given by
method using a least squares optimization technique. The
                                                                                             (:+l)"(boz'+b,z3 + b 2 2 +b,z+b,)
proposed method is computationally efficient and its low delay                 G ( z )=                                                                    ,   (2.2)
makes it suitable for frame-by-frame processing. Three other                                                   ( 2+ a , z + a , ) J
approaches to Gaininatone analysisisynthesis filter bank
                                                                          where the numerator and denominator coefficients bi and ai are
implementations are compared with the proposed method.
                                                                          determined from (2.1) using an appropriate transformation (e.g.
                                                                          the bilinear transform, as in this paper). The implementation M
                  1. INTRODUCTION                                         Gaininatone filters as an analysisisynthesis filter bank system is
                                                                          shown in Figure 2.1. The analysis filter bank consists of M=25
 There has been considerable research devoted to nlodelling the           Gammatone filters at different centre frequencies. The synthesis
 functional roles of peripheral auditory systems. Although                filter in each channel employs the time-reversed impulse
 computational auditory niodels have been shown in some cases             response of the analysis filter [4]. This niethod is based on the
 to outperfonii conventional signal processing techniques,                following property of a Gaininatone filter bank with the ERB
 especially in noisy environments, adequate modelling of the              scale:
 principal behaviour of the peripheral auditory systems is still a                     M
 difficult problem.                                                                        1 G,,,(e'O ) 13 = c
                                                                                   r,r=l
 Recently, quantitative models of the auditory system and                 where c E 3 is a constant and M is the number of filters. The
 auditory processing of speech signal have received a lot of              overall analysisisynthesis system approximates an all-pass filter.
 attention [3] [7]. Many models in the past used a cascade filter
 bank in order to achieve the measured mechanical tuning of the                                   ADllYiS
                                                                                                                 ,111111
                                                                                                                               SWlhcrrS
                                                                                                                                            r,I   ,I   b
                                                                                                  filler I I
                                                                                                          :                    fibCrdl
 basilar membrane. However, in recent models, auditory filtering
 is achieved using Gammatone or other parallel filter banks [5]                                                  ..
                                                                                                                 1,111)

 [6]. A fiindaniental requirement for coding applications is an                                   Alld?¶iS
                                                                                                  filter d Z
                                                                                                                               Spirlicsis
                                                                                                                               filtcrli2
 efficient mechanism for decoinposing and resynthesising the                                                               -
 speech signal. Kubin and KIeijii [4] used a Ganimatone analysis               -
                                                                               .v(n)

 filter bank with a synthesis filter bank whose iinpulse responses                                And?¶<,
                                                                                                  filters,,,
 are the time-reversed impulse responses of the analysis filters,
 which requires a time delay of at least 20 ins to make the system
 causal.
 This paper proposes a computationally efficient auditory filter
 bank inversion method with low delay and other advantages.
 Section 2 presents the overall analysis/syiithesis scheme for the             Figure 2.1. Gaminatone analysisisynthesis filter bank
 Gainmatone filter bank, and three approaches to Gaminatone
 analysisisynthesis filter bank implenientation are presented in          3. IMPLEMENTATION OF GAMMATONE
 Section 3 . A new auditory filter bank inversion approach using a                  FILTER BANK
 least square optimization technique i s presented in section 4.
                                                                          In this section, three types of Gammatone analysis/synthesis filter
    2. THE GAMMATONE FILTER BANK                                          bank inipleinentation schemes will be discussed: FIRFIR,
                                                                          IIWFIR and IIWIIR.
 A Gammatone is the product of a rising polynomial, a decaying
 exponential fiuiction, and a cosine wave. Its impulse response is:       3.1 FI€UFIRFilter Bank Implementation
                                                                          In this case both analysis and synthesis filters are FIR.      The
 where N is the order of the filter, J;. i s the centre frequency,        lengths of the FIR filters should be sufficient to approxiinate the
 ERB(1;.)is the equivalent rectangular bandwidth of the auditory



                                                                      11-537
0-7803-6685-910 1/$10.000200 1 IEEE
 Gammatone impulse response [4] [7]. There are two inherent
 disadvantages of the FIWFIR inversion.
      Due to the infinite length of the Gainmatone impulse
      response, long FIR filters are necessary for accurate
      reconstruction. For example, the length of a FIR
      Gammatone filter at a centre frequency of 5 O O H z
      should be at least 160 samples if the signal is sampled at
      8 kHz.      Hence this method is coniputationally
      expensive.                                                             Figure 3.1. Synthesis of rhe jth frame at the nzth channel
      Because the synthesis filters are time-reversed impulse                of an IIWIIR Gainmatone filter bank (TR: time reversal)
      responses of the analysis filters, a large delay up to
      20ms must be introduced to make the FIR synthesis                 Although the analysis/synthesis schemes introduced in this
      filters causal. This may be excessive for frame-based             section can simplify the computation to some extent, a delay of
      processing.                                                       approximately one frame is unavoidable. In Section 4, we will
                                                                        show that by applying a novel least squares optimization
 3.2 IIR/FIR Filter Bank Implementation                                 technique a FIR synthesis filter bank can be designed that has
                                                                        low delay, low order and any given accuracy of reconstruction.
Altematively, the Gammatone filter given in (2.2) can be
implemented using IIR filters [4] [ 7 ] . However, the synthesis              4. DESIGN OF SYNTHESIS FILTER
filters are still FIR as obtained from the time-reversed impulse
response of the analysis filters. The delay is still inevitable but
                                                                                  BANK BY OPTIMIZATION
the computational load is reduced to nearly half that of the
                                                                        Perhaps the simplest, and apparently ideal, choice for the
FIWFIR scheme.
                                                                        synthesis filters is
3.3   IIRmR Filter Bank Implementation
Another method of inversion is to realize both analysis and             where G,,,(:) is the z-transform of the nzth analysis filter g,,,(n)and
synthesis filter banks as IIR filters. The synthesis filters are        H,,,(z)is the z-transform of the ruth synthesis filter h,,,(n).
exactly the IIR analysis filters, but their inputs are time reversed.
At the output of the synthesis filters, the signal must be time         For non-niinimuni phase Gammatone filters (2.2), the zeros
reversed again to achieve linear phase for each channel. This           outside the unit circle of analysis filter will become unstable
technique has been successfully applied to speech enhancement           poles of the corresponding synthesis filter. Replacing the
and noise suppression by Irino [ 7 ] . Unfortunately, when applied      analysis filters by minimum phase Gainmatone filters can solve
to continuous speech processing, samples from future frames are         this problem. Minimum-phase filters may be obtained using
needed as the initial inputs of the IIR synthesis filters. This         spectral decomposition method. The new analysis filters have
requires a variation of the usual block processing method, as           the same magnitude frequency responses as the old ones. Then
follows.                                                                the synthesis filter in (4.1 ) will be stable and exact reconstruction
                                                                        can be achieved, assuming perfkt arithmetic.
  A study of the impulse responses of Gammatone filters showed
  that the longest impulse response within the frequency range of       However, this method will not perform well in actual coding
  interest takes about 20 nis to decay almost to zero. This suggests    applications. Quantization noise (Figure 4.1) will be emphasized
. that we can apply time reversal to blocks that consist of adjacent    at frequencies where IGJ,,(z)Iis small. Any signal processing
  frames. At least one frame in the block will be used to forget the    between the analysis and synthesis banks will also lead to errors
  effect ,of the unknown initial condition of the filters. Longer       at the same frequencies. For white noise with power 0 : one
                                                                                                                                , in
  block length (more frames in a block) will be more effective to
  due with the initial condition problem. However longer delay will     channel, the noise power at the output of that channel becomes
  also be introduced. The synthesis of one channel is depicted in       otx+h$k)       and is scaled by a factor of       C k = o ~ t,; ( k ),
                                                                                                                              N                  w~iic~i
  figure 3.1.                                                                  -
                                                                        inay be quite large.
The output from the nlth analysis filter is represented as y,,,(n).
Two adjacent frames of the output data from the nzth analysis           We improve this method by designing the synthesis filters using
filter fomi one block. The whole block, which consists of frame         spiking filter techniques. Since the final goal is to reconstruct the
J-1 and frame J, is time reversed. Then the data of this time-          original signal from the output of the analysis filter bank, the
reversed block is filtered through the Gainmatone IIR synthesis         impulse response of the overall analysis/synthesis filter bank
filter. Because the initial condition of the IIR filter is unknown,     system should resemble an impulse. Ideally we sliould have
it is assumed to be zero. Hence the first frame of the output from      x(n)=x(rr) if C ~ , g e , ( n ) : ~ h , , , ( n ) = 6 ( n ) . factors are
                                                                                                                                Two
the IIR synthesis filter will be corrupted and should be discarded.
Only the second frame (the shaded frame) will be useful signal          considered in the optimization design of the synthesis filters:
and should be kept. The second frame is then time-reversed and                                                                   )
                                                                        ( I ) Minimization ofthe noise power gain x L h j : r ( k;
patched together with the sanie frame from other blocks to form
the output of the nrth channel. The delay will be one frame. The
performance is the same as with the IIWFIR method.




                                                                    XI-538
(2) Minimization of the difference between ~ ~ l g l l l*( n )
                                                         h,,,(n)                U
                                                                                ;
                                                                                -= 2h-2AGT (A-Gh) = 0 ,
    and 6(n-L). A delay of L samples is introduced to possibly                  ah
    improve the performance.                                           so that the optimum parameters can be obtained as
Synthesis filter design can be carried out on the individual
channels. However, design over multiple channels provides                      h=(G'G+!)-'G'A                  9                                (4.4)
improved performance.                                                                            A
                                                                       where I is a unit matrix of appropriate size. This optimized
                                                                       design is tested on a 25-channel Gammatone filter bank (M=25).
                                                                       The analysis filter bank consists of the same IIR Gammatone
                                                                       filters ( N = 4 ) as in Section 2. The synthesis filter bank is
                                                                       obtained using the above optimization filter design technique.
    Figure 4.1. Analysis and synthesis in the nI"' channel             The order of each synthesis filter is N , = 20. The reconstruction,
                                                                       delay is chosen as L = I O and the Lagrange multiplier A=200.
To fomiulate the solution in a compact form, we introduce the          These values resulted in a distortion D of 0.1 1. In Figure 4.2, the
following notation:                                                    dashed lines are the frequency responses of analysis filters and
                                                                       the dotted-dashed lines are those of the synthesis filters (Only
            G=[Gl G2 ... GI,,                   GJ                     channel 6, 12 and 18 are shown). Most of the synthesis filters
                                                                       have low-pass characteristics. The flat solid line is the inversion,
            h=[hT h;     h, *.-
                          :- e -               GIP                     or the frequency response of the overall impulse response of the
            A=[O 0 -.. 0 1 0           . e -    0    Or                analysisisynthesis filter bank. The convolution of the analysis
                                                                       and synthesis filter of each channel gives a bandpass response.
                                                                       The superposition of these bandpass filters from all channels
where h =[h,,,(O) h,,,(1) ... h J N s ) r is the impulse
           ,                                                           resembles an allpass filter with a flat magnitude response. Hence
response of the nil' FIR synthesis filter to be designed, N, + 1 is    the impulse response of the overall analysis/synthesis system is a
the length of the filter. G,, is the convolution matrix of the nifh    wavelet that resembles a smoothed impulse at a delayed time
channel analysis filter defined by                                     instant L = IO. A speech signal is decomposed using tlie analysis
                                                                       filter bank. Then 8-bit quantization is carried out on tlie output of
                                                                       the analysis filters prior to the reconstruction by the synthesis
                                                                       filter bank. The reconstructed signal is delayed by only I O
                                                                       samples and is perceptually indistinguishable from the original
                                                                       speech.
        G
                                                                                               Analyam and S Y n t h e S i a by OPtimiZation tn~+?rsion
                                                                              20
                                                                                        --   l"UQrs10"

                                                                              '0        -.   Analysis
                                                                                             Synthesis




where N , is tlie length of the impulse response of tlie nlfh
analysis filter. The position of element "I" in the vector A
depends on the delay L required. We also define

    d=A-G,hl - G ~ h ~ . . - - G , , ~ , , , . . . - G ~ h i M
                                                       =A-Gh,                 -60  1.
                                                                              -70  1
which represents the difference between the actual impulse                                                         -      ,
                                                                                0        500       1000      1500    2000    2500           3000      3500   4000
response of the analydsynthesis system and an exact impulse.                                                    frequency cnz,



Then we minimize I/ h    /I2,
                           the noise of the complete filter bank,           Figure 4.2. Frequency responses of tlie analysisisynthesis
subject to the constraint that the impulse response error (or               filter bank (Only 3 synthesis filters are shown)
distortion) Ildll' does not exceed some nominated value D.             The choice of the Lagrange multiplier A, the order of the
This is achieved by using the Lagrange multiplier method to            syllthesis filters N , and the delay          will influence the
minimize the quantity                                                  performance of the filter bank significantly. These variables are
                                                                       discussed in the following sub-sections.
        J =h*h+ A(d7d-D)                                   (4.2)

where   A   is a Lagrange multiplier and                               4.1 The Lagrange Multiplier

        d7d=(A-Gh)7(A-Gh).                                 (4.3)       The choice of A is decided by the trade-off between the
                                                                       smoothing effect of the filters and the accuracy of reconstruction.
The optimum filter coefficients must satisfy                           The value of D i n (4.3) reflects the difference of the overall
                                                                       analysisisynthesis system from an exact impulse. The relation




                                                                   11-539
between D and A for three different filter orders (N, = 8, 16 and                                                                               Elfesl o f Delay o n the v a l v e 0 1   D. Filter   order. 8   16 2 4 , l a m b d a - 1 0 0 0


24) is shown in Figure 4.3. The delay is chosen as L=lO in each                                                                                                                                                                                  1
example. A large A will have better reconstruction accuracy but
poorer noise suppression ability. A value of D around 0.1 results
                                                                                                                                      0 8
                                                                                                                                      O 9   t
in good noise suppression and reconstructed speech that is
perceptually indistinguishable from the original. For a filter
orde; of 20, A. is usually chosen in the range of 100 to 2000,
depending on the strength of the noise.

4.2 Order of FIR filters N,
                                                                                                                                      -”
                                                                                                                                       .~                                                                                                        I
The value of D decreases with increasing filter order N,. This is                                                                           0              10                20                30
                                                                                                                                                                                             Lag
                                                                                                                                                                                                                  40                  50


shown in Figure 4.4 for three different A values (A = IO, 100 and                                                                           Figure 4.5. Distortion D as a function of delay L
1000) and a delay of L= 10. For filter orders above about 20, the
change in D is not obvious hence large filter orders are not                                                                                                         5. SlJMMARY
required for the synthesis filters. Reconstruction accuracy can be
achieved more effectively with an increase in A and the correct                                                                 Problems of delay, accuracy and computational complexity in
choice of delay L.                                                                                                              frame-by-frame auditory filter bank inversion techniques have
                                                                                                                                been addressed. A novel optimization method was proposed in
4.3 Delay L                                                                                                                     section 4 that can produce FIR filters with around 20 coefficients
                                                                                                                                and delays of around 1.5 ms. This is a substantial improvement
The amount of delay L in the reconstruction, or the position of                                                                 over existing techniques described in section 3, which use filters
element “1” in the vector A , plays a very important role in the                                                                of order 160 or more and delays of 20 ms or more in order to
performance of the filter bank. The dependence of D on the                                                                      perfomi accurate inversion. Further, the new optimization
delay L for three different filter orders with non-minimum phase                                                                technique produces filters whose responses are optimum for a
analysis filters is plotted in Figure 4.5, where it can be seen that                                                            given distortion limit, and which are relatively insensitive to
there is a broad optimum choice of L (near N,), but the exact                                                                   quantization or processing noise. Thus, this technique is highly
choice is not critical. If the analysis filter bank is minimum-                                                                 suitable for the design of auditory filter banks for frame-by-frame
phase Gammatone filters, the optimum delay can be zero. The                                                                     speech and audio processing ,applications such as speech coding.
value of A in these examples is 1000 with N, chosen as 8, 16 and                                                                Although we used the Gammatone model in our application, the
24.                                                                                                                             optimization method also suit:; other types of auditory models.
            Effect   111   lambda on I h e v a l u e o f D Filter O r d e r 8 1 8 24 D e l l y 10 r a m p b r
            r                                                                                                           I                                       6 . REFERENCES
                                                                                                                                     Flanagan, J.L., “Models for approximating basilar
                                                                                                                                     membrane displacement”, Bell Svs. Tech. J , 1960, vol. 39,
                                                                                                                                     pp. 1163-1 191.



     oo’:y=/
                                                                                                                                     Glasberg, B.R. and Moore, B. C. J., “Derivation of auditory
                                                                                                                                     filter shape from notched-noise data”, Hear.. Res., 1990, vol.
                                                                                       lilter order    8
                                                                                                                                     47, pp. 103-138.
                                                                                      ,,,,er o r d e r 2 4
                                                                                       ,,,fer o r d e r 1 6                          Kim, D., Lee, S., and Kil, R., “Auditory processing of
                                                                                                                                     speech signals for robust speech recognition in real-world
     0 05
                                                                                                                                     noisy environments”, IEEE Trans. on Speech and Audio
            0              500        1000         l300          2000           2500          3000            3500   4000
                                                                LAMBDA                                                               Processing, 1999, vol. 7, pp. 55-69.
                 Figure 4.3. Distortion D as a fiinction of A                                                                        Kubin, G. and Kleijn, W.B., “On speech coding in a
                                                                                                                                     perceptual domain”, Proc. ZCASSP (Phoenix, USA), 1999,
                Elf..toflilf.rOldrron          the v a l v c o f D D e l a y   10 l a m b d a 10 1 0 0 1 0 0 0
                                                                                                                                     pp. 205-208.
                                                                                                                                     Lyon, R.F., “The all-pole models of auditory filtering”, in
                                                                                                                                     Diversih. in Aiiditon, Mechanics, World Scientific
                                                                                                                                     Publishing, Singapore, 1997, pp. 205-21 1.
                                                                                                                                     Slaney, M., “An efficient Implementation of the Patterson-
                                                                                                                                     Holdsworth auditory filter bank”, .4pp/e Conipzrter       ’
                                                                                                                                     Technical Report #35, Apple Computer, Inc., 1993.
                                                                                                                                     Irino, T and Unoki, M, “A time-varying, analysis/synthesis
                                                                                                                                     auditory filter bank using the Gammachirp”, Proc. ICASSP
                                                                                                                                     (Seattle, USA), 1998, pp. 3653-3656.

                                                             f,ll*,   0d.r



     Figure 4.4. Distortion D as a function of filter order N ,




                                                                                                                            11-540

								
To top