Docstoc

SUBJECTIVE COMPARISON OF SPEECH ENHANCEMENT ALGORITHMS Yi Hu and

Document Sample
SUBJECTIVE COMPARISON OF SPEECH ENHANCEMENT ALGORITHMS Yi Hu and Powered By Docstoc
					               SUBJECTIVE COMPARISON OF SPEECH ENHANCEMENT ALGORITHMS

                                                    Yi Hu and Philipos C. Loizou ∗

                                                 Department of Electrical Engineering
                                                    University of Texas at Dallas
                                                   Richardson, Texas 75083-0688
                                                   {yihuyxy, loizou}@utdallas.edu


                             ABSTRACT                               and Wiener algorithms. The enhanced speech files were sent
We report on the development of a noisy speech corpus suit-         to Dynastat, Inc (Austin, TX) for subjective evaluation using
able for evaluation of speech enhancement algorithms. This          the recently standardized methodology for evaluating noise
corpus is used for the subjective evaluation of 13 speech en-       suppression algorithms based on ITU-T P.835 [1].
hancement methods encompassing four classes of algorithms:
spectral subtractive, subspace, statistical-model based and Wi-        2. NOIZEUS: A NOISY SPEECH CORPUS FOR
ener algorithms. The subjective evaluation was performed by          EVALUATION OF ENHANCEMENT ALGORITHMS
Dynastat, Inc. using the ITU-T P.835 methodology designed
to evaluate the speech quality along three dimensions: sig-         NOIZEUS1 is a noisy speech corpus recorded in our lab to fa-
nal distortion, noise distortion and overall quality. This paper    cilitate comparison of speech enhancement algorithms among
reports the results of the subjective tests.                        research groups. The noisy database contains 30 IEEE sen-
                                                                    tences [2] produced by three male and three female speak-
                       1. INTRODUCTION                              ers, and was corrupted by eight different real-world noises
                                                                    at different SNRs. Thirty sentences from the IEEE sentence
Over the past three decades, various speech enhancement al-         database were recorded in a sound-proof booth using Tucker
gorithms have been proposed to improve the performance of           Davis Technologies (TDT) recording equipment. The sen-
modern communication devices in noisy environments. Yet,            tences were produced by three male and three female speak-
it still remains unclear as to which speech enhancement algo-       ers (5 sentences/speaker). The IEEE database was used as it
rithm performs well in real-world listening situations where        contains phonetically-balanced sentences with relatively low
the background noise level and characteristics are constantly       word-co-ntext predictability. The thirty sentences were se-
changing. Reliable and fair comparison between algorithms           lected from the IEEE database so as to include all phonemes
has been elusive for several reasons, including lack of com-        in the American English language. The sentences were orig-
mon speech database for evaluation of new algorithms, differ-       inally sampled at 25 kHz and downsampled to 8 kHz. To
ences in the types of noise used and differences in the testing     simulate the receiving frequency characteristics of telephone
methodology. Subjective evaluation of speech enhancement            handsets, the speech and noise signals were filtered by the
algorithms is further complicated by the fact that the quality      modified Intermediate Reference System (IRS) filters used in
of enhanced speech has both signal and noise distortion com-        ITU-T P.862 for evaluation of the PESQ measure.
ponents, and it is not clear as to whether listeners base their          Noise was artificially added to the speech signal as fol-
quality judgments on the signal distortion, noise distortion or     lows. The IRS filter was independently applied to the clean
both. Without having access to a common speech database, it         and noise signals. The active speech level of the filtered clean
is nearly impossible for researchers to compare at very least       speech signal was first determined using the method B of
the objective performance of their algorithms with that of oth-     ITU-T P.56. A noise segment of the same length as the speech
ers.                                                                signal was randomly cut out of the noise recordings, appropri-
     In this paper, we report on the development of a noisy         ately scaled to reach the desired SNR level and finally added
speech corpus (NOIZEUS) suitable for evaluation of speech           to the filtered clean speech signal. Noise signals were taken
enhancement algorithms. This corpus is subsequently used in         from the AURORA database [3] and included the following
a comprehensive subjective evaluation of 13 speech enhance-         recordings from different places: babble (crowd of people),
ment algorithms encompassing four different classes of algo-        car, exhibition hall, restaurant, street, airport, train station,
rithms: spectral subtractive, subspace, statistical-model based     and train. The noise signals were added to the speech signals
  ∗ Research   supported in part by NIDCD/NIH.                        1 Available   at: http://www.utdallas.edu/˜loizou/speech/noizeus/.
  Algorithm                Equation/parameters                       Ref                      4. SUBJECTIVE EVALUATION
  KLT                      Eq. 14,48                                 [8]
  pKLT                     Eq. 34, ν=0.08                            [9]         To reduce the length and cost of the subjective evaluations,
  MMSE-SPU                 Eq. 7,51, q=0.3                           [10]        only a subset of the NOIZEUS corpus was processed by the
  logMMSE                  Eq. 20                                    [11]        13 algorithms and submitted to Dynastat, Inc. for formal sub-
  logMMSE-ne               Eq. 20                                    [11]        jective evaluation. A total of 20 sentences corrupted in four
  logMMSE-SPU              Eq. 2,8,10,16                             [12]        background noise environments (car, street, babble and train)
  pMMSE                    Eq. 12                                    [13]        at two levels of SNR (5dB and 10dB) were processed and pre-
  RDC                      Eq. 6,7,10,14,15                          [14]        sented to 32 listeners for evaluation. These sentences were
  RDC-ne                   Eq. 6,7,10,14,15                          [14]        spoken by two male speakers and two female speakers.
  MB                       Eq. 4-7                                   [15]             The subjective tests were designed according to ITU-T
  WavThr                   Eq. 11,25                                 [16]        recommendation P.835. The P.835 methodology was designed
  Wiener as                Eq. 3-7                                   [4]         to reduce the listener’s uncertainty in a subjective test as to
  AudSup                   Eq. 26,38, νb (i)=1,2 iterations          [17]        which component(s) of a noisy speech signal, i.e., the speech
                                                                                 signal, the background noise, or both, should form the basis of
                                                                                 their ratings of overall quality. This method instructs the lis-
Table 1. List of 13 speech enhancement algorithms evaluated.                     tener to successively attend to and rate the enhanced speech
SPU=speech presence uncertainty, ne=noise estimation.                            signal on: a) the speech signal alone using a scale of signal
                                                                                 distortion (SIG) - [1= very unnatural, 5=very natural], b) the
                                                                                 background noise alone using a scale of background conspic-
at SNRs of 0dB, 5dB, 10dB and 15dB.                                              uous/intrusiveness (BAK) - [1=very conspicuous, very intru-
                                                                                 sive, 5=not noticeable], c) the overall effect using the scale of
                                                                                 the Mean Opinion Score (OVRL) - [1=bad, 5=excellent].
                                                                                      The process of rating the signal and background of noisy
               3. ALGORITHMS EVALUATED                                           speech was designed to lead the listener to integrate the ef-
                                                                                 fects of both the signal and the background in making their
A total of 13 different speech enhancement methods were                          ratings of overall quality. Each trial in a P.835 test involves a
evaluated based on our own implementation. Representative                        triad of speech samples – three samples of the test condition
algorithms from four different cla-sses of enhancement al-                       where each sample is a short segment of speech recorded in
gorithms were chosen: three spectral subtractive algorithms,                     background noise, e.g., a single sentence. For each sample
two subspace algorithms, three Wiener algorithms2 and five                        within the triad, listeners successively used one of the three
statistical-model based algorithms. A subset of those algo-                      five-point rating scales, SIG, BAK, and OVRL, to register
rithms were evaluated with and without noise-estimation al-                      their judgments of the quality of the test condition. In addi-
gorithms. The parameters used in the implementation of these                     tion to the experimental conditions, each experiment included
algorithms were the same as those published unless stated                        a number of reference conditions designed to independently
otherwise3 . Table 1 shows the list of algorithms evaluated                      vary the listener’s SIG, BAK, and OVRL ratings over the en-
with the associated parameters and Equations given in the                        tire five-point range of the rating scales. More details about
references. The decision-directed approach was used for es-                      the testing methodology can be found in [18]. The figures
timating the a priori SNR in the statistical methods and the                     show the mean scores for SIG, BAK, and OVRL scales for the
Wiener as method [4] with a=0.98.                                                13 methods evaluated. The mean scores for the noisy speech
                                                                                 (unprocessed) files are also shown for reference.
    The majority of the algorithms utilized a voice activity
detector [5] to update the noise spectrum during the speech-
absent periods. The subspace methods used a different VAD                                 5. DISCUSSION AND CONCLUSIONS
method [6] with threshold value set to 1.2. To assess the merit
of noise-estimation algorithms [7], two speech-enhancement                       Of the two subspace algorithms examined, the generalized
algorithms were implemented with both VAD and noise esti-                        subspace approach [8] performed consistently better in OVRL
mation algorithms. These algorithms are indicated in Table 1                     scale across all SNR conditions and four types of noise. The
with the suffix ‘-ne’.                                                            performance of these two methods was distinctively different
                                                                                 in +5dB car noise. Lower signal distortion (i.e., higher SIG
                                                                                 scores) were observed with the generalized subspace method
    2 The Wiener-type algorithms were grouped separately since these algo-
                                                                                 in most conditions. Of the five statistical-model based algo-
rithms estimate the complex spectrum while the statistical-model algorithms
estimate the magnitude spectrum in the mean square sense.
                                                                                 rithms examined, the log-MMSE and the perceptually moti-
    3 No adjustments were made for algorithms (e.g., [12]) originally designed   vated MMSE (pM-MSE) algorithms performed the best. Per-
for 16 kHz.                                                                      formance of the pMMSE algorithm was comparable to that
of the MMSE algorithm which incorporated speech-presence            [7] S. Rangachari and P. C. Loizou, “A noise estima-
uncertainty (the pMMSE algorithm did not). Lower noise                  tion algorithm for highly non-stationary environments,”
distortion (i.e., high-er BAK scores) was obtained with the             Speech Communication, vol. 28, pp. 220–231, Feb.
pMMSE method in several conditions (5dB train, 5dB car,                 2006.
10dB street). It was surprising to see that the noise-estimation
algorithm [7] did not provide significant improvements to the        [8] Y. Hu and P. C. Loizou, “A generalized subspace
performance of the log-MMSE algorithm (small improvements               approach for enhancing speech corrupted by colored
were noted only in street noise). Incorporating speech-presence         noise,” IEEE Trans. Speech Audio Proc., pp. 334–341,
uncertainty as per [12] did not improve the performance of the          July 2003.
log-MMSE algorithm. In fact, it degraded performance. Of            [9] F. Jabloun and B. Champagne, “Incorporating the hu-
the two spectral-subtractive algorithms tested, the multi-band          man hearing properties in the signal subspace approach
spectral subtraction algorithm [15] performed consistently the          for speech enhancement,” IEEE Trans. Speech Audio
best across all conditions. Incorporating a noise-estimation            Proc., vol. 11, pp. 700–708, 2003.
algorithm did not improve the performance of the reduced-
delay spectral subtraction algorithm. One possible explana-        [10] Y. Ephraim and D. Malah, “Speech enhancement using
tion for that is that the speech files were too brief in duration        a minimum mean-square error short-time spectral am-
to observe the real benefit of noise-estimation algorithms. Fi-          plitude estimator,” IEEE Trans. Acoustics, Speech and
nally, of the three Wiener filtering type of algorithms, the             Signal Processing, vol. ASSP-32, pp. 1109–1121, 1984.
method proposed in [4] based on a priori SNR, performed
the best. This method also produced consistently the low-          [11] Y. Ephraim and D. Malah, “Speech enhancement using a
est signal distortion comparable to the statistical-model based         minimum mean-square error log-spectral amplitude es-
methods. It did, however, suffer from high noise distortion.            timator,” IEEE Trans. Acoustics, Speech and Signal Pro-
     Overall, the statistical-model based methods performed             cessing, vol. ASSP-33, pp. 443–445, 1985.
the best across all conditions, followed by the multi-band spec-   [12] I. Cohen, “Optimal speech enhancement under signal
tral subtraction method [15].                                           presence uncertainty using log-spectral amplitude esti-
                                                                        mator,” IEEE Signal Processing Letters, vol. 9, pp. 113–
                    6. REFERENCES                                       116, Apr. 2002.
                                                                   [13] P. C. Loizou, “Speech enhancement based on percep-
 [1] ITU-T P.835, Subjective test methodology for evalu-
                                                                        tually motivated Bayesian estimators of the magnitude
     ating speech communication systems that include noise
                                                                        spectrum,” IEEE Trans. Speech Audio Proc., pp. 857–
     suppression algorithms, ITU-T Recommendation P.835,
                                                                        869, Sept. 2005.
     2003.
                                                                   [14] H. Gustafsson, S. Nordholm, and I. Claesson, “Spectral
 [2] IEEE Subcommittee, “IEEE recommended practice for                  subtraction using reduced delay convolution and adap-
     speech quality measurements,” IEEE Trans. Audio and                tive averaging,” IEEE Trans. Speech Audio Proc., pp.
     Electroacoustics, pp. 225–246, 1969.                               799–807, 2001.
 [3] H. Hirsch and D. Pearce, “The AURORA experimen-               [15] S. Kamath and P. C. Loizou, “A multi-band spectral
     tal framework for the performance evaluation of speech             subtraction method for enhancing speech corrupted by
     recognition systems under noisy conditions,” in ISCA               colored noise,” in Proc. IEEE Int. Conf. Acoust., Speech,
     ITRW ASR2000, Sept. 2000, Paris, France.                           Signal Processing, 2002.
 [4] P. Scalart and J. Filho, “Speech enhancement based            [16] Y. Hu and P. C. Loizou, “Speech enhancement based
     on a priori signal to noise estimation,” in Proc. IEEE             on wavelet thresholding the multitaper spectrum,” IEEE
     Int. Conf. Acoust., Speech, Signal Processing, 1996, pp.           Trans. Speech Audio Proc., pp. 59–67, Jan. 2004.
     629–632.
                                                                   [17] D. E. Tsoukalas, J. N. Mourjopoulos, and G. Kokki-
 [5] J. Sohn, N. Kim, and W. Sung, “A statistical model-                nakis, “Speech enhancement based on audible noise
     based voice activity detection,” IEEE Signal Processing            suppression,” IEEE Trans. Speech Audio Proc., vol. 5,
     Letters, pp. 1–3, Jan. 1999.                                       pp. 479–514, Nov. 1997.

 [6] U. Mittal and N. Phamdo, “Signal/noise KLT based              [18] Y. Hu and P. C. Loizou, “Subjective evaluation and com-
     approach for enhancing speech degraded by colored                  parison of speech enhancement algorithms,” submitted
     noise,” IEEE Trans. Speech Audio Proc., vol. 8, pp.                to Speech Communication.
     159–167, Mar. 2000.
                                                                               5dB babble noise                                                                                                                                                                       5dB street noise


5.0                                                                                                                                                                                         5.0
                                                                                                                                                                                     SIG                                                                                                                                                    SIG
4.5                                                                                                                                                                                         4.5
                                                                                                                                                                                     BAK                                                                                                                                                    BAK
4.0                                                                                                                                                                                         4.0
                                                                                                                                                                                     OVRL                                                                                                                                                   OVRL
3.5                                                                                                                                                                                         3.5
3.0                                                                                                                                                                                         3.0
2.5                                                                                                                                                                                         2.5
2.0                                                                                                                                                                                         2.0
1.5                                                                                                                                                                                         1.5
1.0                                                                                                                                                                                         1.0




                                                                                                                                                                                                                                                                                                      WavThr
                                                                                                                                                                                                                                         logMMSE-ne




                                                                                                                                                                                                                                                                                       RDC_ne




                                                                                                                                                                                                                                                                                                                           AudSup

                                                                                                                                                                                                                                                                                                                                    noisy
                                                                                                                                                                                                                              logMMSE



                                                                                                                                                                                                                                                        logMMSE-SPU




                                                                                                                                                                                                                                                                                                               Wiener_as
                                                                                                                                                                                                  KLT

                                                                                                                                                                                                         pKLT




                                                                                                                                                                                                                                                                                 RDC
                                                                                                                                                                                                                                                                        pMMSE




                                                                                                                                                                                                                                                                                                MB
                                                                                                                                                                                                                  MMSE-SPU
                                                logMMSE-ne




                                                                                                                                                            AudSup

                                                                                                                                                                          noisy
                                   logMMSE




                                                                                                                           WavThr

                                                                                                                                         Wiener_as
                                                                 logMMSE-SPU
              pKLT




                                                                                    pMMSE




                                                                                                        RDC_ne
        KLT




                                                                                               RDC




                                                                                                                    MB
                      MMSE-SPU




       Subspace                     Statistical-                                                 Spectral                 Wiener-type                                                              Subspace                   Statistical-                                       Spectral            Wiener-type
      algorithms                   model based                                                  subtractive               algorithms                                                              algorithms                 model based                                        subtractive          algorithms



                                                                                    10dB babble                                                                                                                                                                       10dB street noise
                                                                                       noise
                                                                                                                                                                                            5.0
5.0                                                                                                                                                                                                                                                                                                                                         SIG
                                                                                                                                                                                            4.5
4.5                                                                                                                                                                                  SIG                                                                                                                                                    BAK
                                                                                                                                                                                            4.0
4.0                                                                                                                                                                                  BAK                                                                                                                                                    OVRL
                                                                                                                                                                                            3.5
3.5                                                                                                                                                                                  OVRL
                                                                                                                                                                                            3.0
3.0
2.5                                                                                                                                                                                         2.5
2.0                                                                                                                                                                                         2.0
1.5                                                                                                                                                                                         1.5
1.0                                                                                                                                                                                         1.0
                                                                                                                          WavThr
                                                logMMSE-ne




                                                                                                        RDC_ne




                                                                                                                                                           AudSup

                                                                                                                                                                         noisy
                                   logMMSE



                                                                 logMMSE-SPU




                                                                                                                                        Wiener_as
        KLT

              pKLT




                                                                                               RDC
                                                                                    pMMSE




                                                                                                                    MB
                      MMSE-SPU




                                                                                                                                                                                                                                         logMMSE-ne




                                                                                                                                                                                                                                                                                                                           AudSup

                                                                                                                                                                                                                                                                                                                                    noisy
                                                                                                                                                                                                                             logMMSE




                                                                                                                                                                                                                                                                                                     WavThr

                                                                                                                                                                                                                                                                                                               Wiener_as
                                                                                                                                                                                                                                                       logMMSE-SPU
                                                                                                                                                                                                        pKLT




                                                                                                                                                                                                                                                                        pMMSE




                                                                                                                                                                                                                                                                                       RDC_ne
                                                                                                                                                                                                  KLT




                                                                                                                                                                                                                                                                                 RDC




                                                                                                                                                                                                                                                                                                MB
                                                                                                                                                                                                                 MMSE-SPU
       Subspace                           Statistical-                                           Spectral                     Wiener-type                                                          Subspace                   Statistical-                                       Spectral            Wiener-type
      algorithms                         model based                                            subtractive                   algorithms                                                          algorithms                 model based                                        subtractive          algorithms




                                                                                    5dB car noise                                                                                                                                                                          5 dB Train

5.0                                                                                                                                                                                         5.0
                                                                                                                                                                                     SIG                                                                                                                                                    SIG
4.5                                                                                                                                                                                         4.5
4.0                                                                                                                                                                                  BAK                                                                                                                                                    BAK
                                                                                                                                                                                            4.0
3.5                                                                                                                                                                                  OVRL                                                                                                                                                   OVRL
                                                                                                                                                                                            3.5
3.0                                                                                                                                                                                         3.0
2.5                                                                                                                                                                                         2.5
2.0                                                                                                                                                                                         2.0
1.5                                                                                                                                                                                         1.5
1.0                                                                                                                                                                                         1.0
                                                                                                                          WavThr
                                                 logMMSE-ne




                                                                                                        RDC_ne




                                                                                                                                                           AudSup

                                                                                                                                                                         noisy
                                   logMMSE



                                                                 logMMSE-SPU




                                                                                                                                        Wiener_as
        KLT

              pKLT




                                                                                               RDC
                                                                                    pMMSE




                                                                                                                    MB
                       MMSE-SPU




                                                                                                                                                                                                                                                                                                     WavThr
                                                                                                                                                                                                                                        logMMSE-ne




                                                                                                                                                                                                                                                                                       RDC_ne




                                                                                                                                                                                                                                                                                                                           AudSup

                                                                                                                                                                                                                                                                                                                                    noisy
                                                                                                                                                                                                                             logMMSE



                                                                                                                                                                                                                                                      logMMSE-SPU




                                                                                                                                                                                                                                                                                                               Wiener_as
                                                                                                                                                                                                  KLT

                                                                                                                                                                                                        pKLT




                                                                                                                                                                                                                                                                                 RDC
                                                                                                                                                                                                                                                                        pMMSE




                                                                                                                                                                                                                                                                                                MB
                                                                                                                                                                                                                MMSE-SPU




         Subspace                   Statistical-                                               Spectral                   Wiener-type                                                              Subspace                        Statistical-                                  Spectral            Wiener-type
        algorithms                 model based                                                subtractive                 algorithms                                                              algorithms                      model based                                   subtractive          algorithms


                                                                                    10dB car noise                                                                                                                                                                       10 dB Train

5.0                                                                                                                                                                                         5.0
                                                                                                                                                                                     SIG                                                                                                                                                    SIG
4.5                                                                                                                                                                                         4.5
                                                                                                                                                                                     BAK                                                                                                                                                    BAK
4.0                                                                                                                                                                                         4.0
                                                                                                                                                                                     OVRL                                                                                                                                                   OVRL
3.5                                                                                                                                                                                         3.5
3.0                                                                                                                                                                                         3.0
                                                                                                                                                                                            2.5
2.5
                                                                                                                                                                                            2.0
2.0
                                                                                                                                                                                            1.5
1.5                                                                                                                                                                                         1.0
                                                                                                                                                                                                                                                                                                     WavThr
                                                                                                                                                                                                                                        logMMSE-ne




                                                                                                                                                                                                                                                                                       RDC_ne




                                                                                                                                                                                                                                                                                                                           AudSup

                                                                                                                                                                                                                                                                                                                                    noisy
                                                                                                                                                                                                                             logMMSE



                                                                                                                                                                                                                                                      logMMSE-SPU




                                                                                                                                                                                                                                                                                                               Wiener_as
                                                                                                                                                                                                  KLT

                                                                                                                                                                                                        pKLT




                                                                                                                                                                                                                                                                                 RDC
                                                                                                                                                                                                                                                                        pMMSE




                                                                                                                                                                                                                                                                                                MB
                                                                                                                                                                                                                MMSE-SPU




1.0
                                                                                                                              WavThr
                                                    logMMSE-ne




                                                                                                           RDC_ne




                                                                                                                                                                AudSup

                                                                                                                                                                             noisy
                                     logMMSE




                                                                      logMMSE-SPU




                                                                                                                                               Wiener_as
        KLT

               pKLT




                                                                                                RDC
                                                                                      pMMSE




                                                                                                                     MB
                        MMSE-SPU




         Subspace                             Statistical-                                             Spectral                        Wiener-type                                                 Subspace                   Statistical-                                       Spectral            Wiener-type
        algorithms                           model based                                              subtractive                      algorithms                                                 algorithms                 model based                                        subtractive          algorithms

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:10
posted:10/17/2011
language:English
pages:4