Docstoc

A New Noise Estimation Technique of Speech Signal by Degree of Noise Refinement

Document Sample
A New Noise Estimation Technique of Speech Signal by Degree of Noise Refinement Powered By Docstoc
					                                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                               Vol. 8, No. 8, November 2010

A New Noise Estimation Technique of Speech Signal
        by Degree of Noise Refinement
     Md. Ekramul Hamid                     Md. Zasim Uddin                  Md. Humayun Kabir Biswas                    Somlal Das
  College of Computer Science           Dept. of Computer Science            College of Computer Science        Dept. of Computer Science
     King Khalid University              University of Rajshahi                King Khalid University            University of Rajshahi
 Abha, Kingdom of Saudi Arabia            Rajshahi, Bangladesh.             Abha, Kingdom of Saudi Arabia         Rajshahi, Bangladesh.
e-mail:ekram_hamid@yahoo.com          e-mail: cse.zasim@gmail.com           e-mail: mhkbiswas@yahoo.com       e-mail:somlal_ru@yahoo.com


Abstract— An improved method for noise estimation of speech                 noise-only regions of the noisy speech spectrum. In the
utterances which are disturbed by additive noise is presented in            improved MCRA approach (Cohen 2003) [4], a different
this paper. Here, we introduce degree of noise refinement of                method was used to track the noise-only regions of the
minima value sequence (MVS) and some additional techniques                  spectrum based on the estimated speech-presence probability.
for noise estimation. Initially, noise is estimated from the valleys        Doblinger [5] updated the noise estimate by continuously
of the spectrum based on the harmonic properties of noisy                   tracking the minimum of the noisy speech in each frequency
speech, called MVS. However, the valleys of the spectrum are not            bin. As such, it is computationally more efficient than the
pronounced enough to warrant reliable noise estimates. We,                  method in Martin 2001. However, it fails to differentiate
therefore, initially use the estimated Degree of Noise (DON) to
                                                                            between an increase in noise floor and increase in speech
adjust the MVS level. For every English phoneme DON is
calculated and averaged within those processing frames for the
                                                                            power. Hirsch and Ehrlicher [6] updated the noise estimate by
each input SNR. We consider this calculated average DONs as                 comparing the noisy speech power spectrum to the past noise
standard value corresponding to the input SNR which is aligned              estimate. This method fails to update the noise when the noise
with the true DON using the least-squares (LS) method results a             floor increases abruptly and stays at that level. In our previous
function to estimate the degree of noise. Therefore, using the              study, Hamid (2007) [7] proposed the noise estimation by
technique, it is possible to estimate the state of the added noise          using the MVS. The noise floor is updated with the help of
more accurately. We use two stage refinements of estimated DON              estimated DON. Here DON is estimated on the basis of pitch
to update the MVS as well as to estimate a nonlinear weight for             and the pitch of unvoiced sections is not accurately estimated.
noise subtraction. The performance of the proposed noise
estimation is good when it is integrated with the speech                        In this paper, we propose a method which has good noise
enhancement technique.                                                      tracking and controlling capability. To estimate noise, first we
                                                                            search for the valleys of the amplitude spectrum on a frame by
   Keywords-component; Noise Estimation, the Degree of Noise,               frame basis and estimate minima values of the spectrum, called
Speech Enhancement, Nonlinear Weighted Noise Subtraction                    minima value sequence (MVS). To improve the estimation
                                                                            accuracy of MVS, we use DON. As it is a single-channel
                       I.    INTRODUCTION                                   method, direct estimation of the degree of noise is not possible.
    Noise estimation is one of the most important aspects for               For that, frame wise averaged DON is estimated from the
single channel speech enhancement. Usually in single-channel                estimated noise of the observed signal. We have considered
speech enhancement systems, most algorithms require a voice                 these DONs as standard value corresponding to the input SNR.
activity detector (VAD) and the speech/pause detection plays                Then each of these estimated 1st averaged DONs for
the major role in the performance of the whole system.                      corresponding input SNR is aligned with the true DON using
However, these systems can perform well for voiced speech                   the least-squares (LS) method results the 1st estimated degree
and high signal-to-noise ratio (SNR), but their performance                 of noise (DON1) of that frame. The 1st estimated DON1 is
degrades with unvoiced speech in low SNR.                                   applied to update the MVS. Next, the noise level is re-
                                                                            estimated and from the estimated noise, we again estimate 2nd
    Traditional noise estimators are based on voice activity                averaged DON and similarly get the 2nd estimated DON2. We
detectors (VAD) which are difficult to tune and their                       used the 2nd estimated DON2 to estimate the weight for noise
application to low SNR speech results often in clipped speech.              subtraction process. Because noise is estimated from the
The original MMSE-STSA estimates the noise power spectrum                   estimated DONs, which is obtained from the true DON, so it is
on the basis of the noisy speech only in the first non-speech               possible to estimate noise amplitudes in more accurate form
period where the pure noise is available [1]. However, these                with lower speech distortion and able to suppress musical noise
systems can perform well only for voiced speech and high                    in the enhanced speech.
SNR. Martin (2001) proposed a method for estimating the
noise spectrum based on tracking the minimum of the noisy.                            II.   PROPOSED NOISE ESTIMATION METHOD
The main drawback of this method is it fails to update the noise            We have assumed that speech and noise are uncorrelated to
spectrum when the noise floor increases abruptly [2]. Cohen                 each other. Let y(n)=s(n)+d(n), where y(n) is the observed
(2002) [3] proposed a minima controlled recursive algorithm                 noisy speech signal, s(n) is the clean speech signal and d(n)
(MCRA) which updates the noise estimate by tracking the



                                                                       37                              http://sites.google.com/site/ijcsis/
                                                                                                       ISSN 1947-5500
                                                                 (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                Vol. 8, No. 8, November 2010
is the additive noise. We further assume that signal and
noise are statistically independent. Under the above
assumptions, we can write the powers as Py=Ps+Pd.
A. Estimation of the minima value sequence (MVS)
    The sections of consecutive samples are used as a single
frame l (320 samples). Consecutive frames are spaced l’ (100
samples) achieving an almost 62.75% overlap between them.
The short-term representation of a signal y(n) is obtained by
windowing (Hamming window) and analyzed using N=512
point discrete-Fourier transform (DFT) in sampling frequency
16KHz. Initially, noise spectrum is estimated from the valleys
of the amplitude spectrum and we assume that the peaks
correspond to voice parts and valleys are the noise only parts.
The algorithm for noise estimation is as follows:
                                                                                         Figure 1. Block diagram of the 2nd estimated DON1,
                                                                                                                                              Z1m .
    1.      Compute the RMS value Yrms of the amplitude
            spectrum         Y(k).      We    detect  the    minima          1st averaged estimated DON        Z1m
            Ymin ( k min ) ← min(Y (k )) values of Y(k) when the
                                                                                         1   M     P (m)
            following condition (Y(k)<Y(k-1) and Y(k)<Y(k+1) and
            Y(k)<Yrms) is satisfied. The kmin expresses the positions
                                                                                 Z1m =
                                                                                         M
                                                                                             ∑ Pη (m)
                                                                                             m=1   obs                                         (2)
            of the frequency bin index of minima values.
                                                                                 where, M are the noise added frames; Pη(m)and Pobs(m) are
    2.      Interpolate between adjoining minima positions                   the powers of noise and observed signals, respectively. Here it
            ( k min ← k ) to obtain the minimum value sequences              obvious that we consider only the voiced phonemes in our
            (MVS) Ymin(k).                                                   experiment. So the averaged DON value should be limited to
                                                                             voiced portion of a speech sentence. But practically the
    3.      We smooth the sequences by taking partial average                unvoiced portion contaminated with higher degree of noise.
            called smoothed minimum value sequences (SMVS).                  Hence the estimated noise is higher for unvoiced frame than
            This process continuously updates the estimation of              from voiced frame. Consequently higher DON value is
            noise among every analysis frames.                               obtained from unvoiced frame than from voiced frame that is
   An estimation of noise from the SMVS is survived by an                    logically resemblance. The degree of noise estimated from a
overestimation and underestimation of the SNR. To achieve                    previously prepared function using least square method is given
good tracking capability with controlled overestimation                      by [7]
problem, the proposed noise estimation algorithm adopting the
                                                                                 Z 1m = a × Z 1m + b
concept of DON. The block diagram of the noise estimation                                                                         (3)
process is given in Figure 1.                                                where Z1m is the 1st estimated DON1 of frame m. The error
                                                                             between the true and the estimated values can be minimized by
B. Estimation of the Degree of Noise (DON)                                   tuning a, b. In the experiment, 20 phoneme sounds for 3 male
    In a single-channel method, we only know the power of the                and 3 female degraded by the white noise in different SNRs
observed signal. Therefore, direct estimation of the degree of               (-10,-5,0,….,30 dB) is considered. The value of Z1m is applied
noise ( Pd / Pobs ) is not possible. For that, frame wise DON is             to update the MVS. Next, the noise level is re-estimated with
estimated from the estimated noise of the observed signal of                 the help of Z1m. Finally, from the estimated noise, we again
each frame m. For optimal estimation of DON, we are carried                  estimate 2nd averaged DON ( Z2m ) and similarly the 2nd
out our experiment on 20 vowel phonemes of 3 male and 3                      estimated DON2 (Z2m) which is used to estimate the noise
female taken from TIMIT database. First white noise of various               weight for nonlinear weighted noise subtraction.
SNR are added to these voiced vowel phonemes. Then for each
SNR white noisy phonemes are processed frame wise and                            We conduct an experiment on the noisy speech (white
DON is estimated in each frame for each phoneme                              noise) utterance /water/ of a female speaker of various input
individually. For every phoneme DON is averaged within those                 SNRs and obtain the 1st estimated DON1, Z1m and 2nd estimated
processing frames for the corresponding input SNR. Then each                 DON2, Z2m. Figure 2 illustrates the frame wise true degree of
of these estimated 1st averaged DONs of each frame m for                     noise calculated and the estimated degree of noise obtained in
corresponding input SNR expressed as Z1m . This Z1m is aligned               every analysis frame for different input SNRs. By adopting
with the true DON (Ztr) using the least-squares (LS) method                  smoothing in the MVS, the overestimation problem is
results the 1st estimated degree of noise (DON1) Z1m of that                 minimized and the effect of musical noise is reduced. In fact
frame. The true DON (Ztr) is given by                                        smoothing is performed to reduce the high frequency
                                                                             fluctuations. Since for speech most of the signal energy is
               Pd           1                                                concentrated in low frequencies, for that reason smoothing is
    Ztr =           =
            Ps + Pd             dB
                                                                             reducing the high frequency components and gives increased
                        1 + 10 10                          (1)




                                                                        38                                   http://sites.google.com/site/ijcsis/
                                                                                                             ISSN 1947-5500
                                                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                          Vol. 8, No. 8, November 2010
signal-to-noise ratio. The Fig. 3 shows, the true and the                                introduce a nonlinear weighting factor to control the
estimated degree of noise are almost equal in all SNRs.                                  overestimation and minimizing the effect of residual noise. The
                                                                                         NWNS is given by:
                                                                                                                       )
                                                                                              s1(n) = y(n) − α × Ztr × dss(n)
                                                                                                                                                              (5)

                                                                                            where α = 0.3019 + 6.4021× Z 2m − 14.109 × Z            + 9.8273 × Z 2 m
                                                                                                                                               2                 3
                                                                                                                                               2m                      is
                                                                                         nonlinear weighting factor.
                                                                                            It is observed from Eq. (5) that it needs the input SNR. The
                                                                                         input SNR can be estimated using variance is given by
                                                                                                                    ⎛σ 2   ⎞
                                                                                              SNR input = 10 log 10 ⎜ s2   ⎟
                                                                                                                    ⎜σ     ⎟
                                                                                                                    ⎝ η    ⎠                                  (6)
Figure 2a. True vs 1st avg. DON (T)         Figure 2b. True vs 2nd avg. DON (T)
                                                                                                         σ     2
                                                                                         where, σ s and η are the variances of speech and noise,
                                                                                                   2
and True vs 1st estimated DON1 (B).         and True vs 2nd estimated DON2 (B).

                                                                                         respectively. We assume that due to the independency of noise
                                                                                         and speech, the variance of the noisy speech is equal to the sum
                                                                                         of the speech variance and noise variance. It is found that by
                                                                                         adopting nonlinear weighted in NS, a good noise reduction is
                                                                                         obtained. Although with the NWNS, we find the good
                                                                                         performance with less musical noise by informal listening test.




Figure 3. Frame wise graphical representations of the true (solid with point) and
the 1st estimated DON1 (dotted line with circle) and 2nd estimated DON2 (solid
line with double linewidth) for –5dB (left) and , 5dB (right) SNR noisy speech.



C.   Estimation of Noise Spectrum
    The noise spectrum is estimated from the SMVS and 1st
estimated DON according to the condition
                        (
     Dm (k ) = Ymin (k ) + Z1m × Yrms   )                           (4)
Then we made some updates of Dm(k), the updated spectrum is
again smoothed by three point moving average, and lastly the
main maximum of the spectrum is identified and are
suppressed [7].
          III.    WEIGHTED NOISE SUBTRACTION (NWNS)
                                                                                         Figure 4. The depictions spectrums of voiced and unvoiced frames degraded by
    Noise reduction based on implementation of the traditional                           white noise at 5dB SNR is shown in (a) and (b), 10dB SNR is shown (c) and
spectral subtraction (SS) require an available estimation of the                         (d), 20dB SNR is shown in (e) and (f), 30dB SNR is shown in (g) and (h)
embedded noise, here, in time domain we named Noise                                      respectively.
Subtraction (NS). It is observed that, in NS, degradation occurs                         A.    Derievation of non linear weight
for overestimation of noise within the unvoiced region of noisy
speech at higher input SNR (>10 dB). We manually seen that                                   It is observed that the outcome of the subtraction type
the unvoiced region provides flat spectrum characteristics and                           algorithms produce musical noise and that cannot be avoided.
exhibits low SNR that gives more degree of noise value that                              Since algorithms with fixed subtraction parameters are unable
increases the noise level. Therefore, the extracted noise in                             to adapt well to the varying noise levels and characteristics,
unvoiced region is high and degrades the speech. From Figure                             therefore it becomes imperative to estimate a suitable factor to
4, it is seen that the unvoiced frame of higher SNR (>10 dB)                             update the noise level. Hence we derive a nonlinear weighting
input noisy speech provides flat spectrum and low SNR that                               factor α for this purpose. First, simulation is performed over 7
gives more DON2 (Z2m) value that increases weighting factor.                             males and 7 females speakers of different sentences at
So more noise has subtracted at every unvoiced frame than                                different SNR levels, randomly selected from the TIMIT
from every voiced frame, say at 25 dB SNR input speech.                                  database, for different values of α and record the output SNR.
Consequently speech distortion has to be occurred. For that, we



                                                                                    39                                     http://sites.google.com/site/ijcsis/
                                                                                                                           ISSN 1947-5500
                                                                   (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                  Vol. 8, No. 8, November 2010
Table 1 shows the performance of computer simulation of the                                               9
algorithm of a given noisy sentence of a female speaker for                           SE = ∑ ei2
different values of α.                                                                                   i =1

TABLE 1: The output SNR for a noisy speech of a female speaker for
different values of α. for wide range of input SNR (-10dB to 30dB). The        We consider α is a polynomial of degree 3.
speech is degraded by white noise nose.
                                                                               Then the 3rd degree polynomials are:

                                                                               α = f ( x) = a0 + a1 x + a2 x 2 + a3 x 3 }                                                                                              (7)

                                                                               be fitted to the data points (xi , yi), i = 1,2,….,9.

                                                                               x represents the values of DON2.

                                                                               The summation of errors at x = xi is given by
                                                                                                         [ (                                                                                          )]
                                                                                                   9
                                                                               SE ≡ ∑ yi − a0 + a1 xi + a 2 xi + a3 xi
                                                                                                                                                                     2                            3       2

                                                                                                  i =1                                                                                                                 (8)

                                                                               For SE to be minimum, we have
                                                                                                                                 2

                                                                                     = −2∑ [yi − (a0 + a1 xi + a2 xi2 + a3 xi3 )] = 0
                                                                               ∂(SE)        9


                                                                                ∂a0       i =1
                                                                                                                                                                                                                       (9)
                                                                                                                                                                                              2
                                                                               ∂ ( SE )
                                                                                                                 [ (                                                                      )] x
                                                                                              9
                                                                                        = −2∑ yi − a0 + a1 xi + a2 xi2 + a3 xi3                                                                            =0          (10)
                                                                                 ∂a1
                                                                                                                                                                                                      i
                                                                                            i =1
                                                                                                                                                                                              2
                                                                               ∂( SE)
                                                                                                                 [       (                                                                )] x
                                                                                            9
                                                                                      = −2∑ yi − a0 + a1 xi + a2 xi2 + a3 xi3                                                                         2
                                                                                                                                                                                                           =0
                                                                                ∂a2
                                                                                                                                                                                                      i
                                                                                          i =1
                                                                                                                                                                                                                        (11)
                                                                                                                                                                                              2
                                                                               ∂ ( SE )
                                                                                                                 [       (                                                                )] x
                                                                                                         9
                                                                                        = −2∑ yi − a 0 + a1 xi + a 2 xi2 + a3 xi3                                                                     3
                                                                                                                                                                                                           =0
                                                                                ∂a3
                                                                                                                                                                                                      i
                                                                                            i =1
                                                                                                                                                                                                                        (12)

                                                                               From Eq. (9), (10), (11) and (12) we have,

                                                                                9                                         9                           9                                   9

                                                                               ∑y         i       = 9 a 0 + a1 ∑ x i + a 2 ∑ x i2 + a 3 ∑ x i3
TABLE 2: The average weight of α for 7 male and 7 female utterances            i =1                                      i =1                     i =1                                i =1                             (13)
corresponding to wide range of input SNR (-10dB to 30dB).                       9                            9                   9                         9                                  9

                                                                               ∑x y   i       i    = a 0 ∑ xi + a1 ∑ x + a 2 ∑ x + a3 ∑ x2
                                                                                                                                         i
                                                                                                                                                                         3
                                                                                                                                                                         i
                                                                                                                                                                                                          4
                                                                                                                                                                                                          i
                                                                               i =1                      i =1                   i =1                      i =1                            i =1                         (14)
                                                                                9                            9                    9                        9                                  9

                                                                               ∑x     2
                                                                                      i   y i = a0 ∑ x + a1 ∑ x + a 2 ∑ x + a3 ∑ x
                                                                                                                     2
                                                                                                                     i
                                                                                                                                             3
                                                                                                                                             i
                                                                                                                                                                         4
                                                                                                                                                                         i
                                                                                                                                                                                                          5
                                                                                                                                                                                                          i
                                                                               i =1                       i =1                   i =1                     i =1                            i =1                         (15)
                                                                                9                            9                    9                        9                                  9

                                                                               ∑x     3
                                                                                      i   yi = a0 ∑ xi3 + a1 ∑ xi4 + a2 ∑ xi5 + a3 ∑ xi6
                                                                               i =1                       i =1                   i =1                     i =1                            i =1                         (16)

                                                                               We write these equations in a matrix form as:

                                                                               ⎡ 9        ⎤ ⎡                                    9                9                           9
                                                                                                                                                                                                      ⎤       ⎡a0 ⎤
                                                                               ⎢∑ yi ⎥ ⎢9                                       ∑x ∑x ∑x                   2                          3
                                                                                                                                         i                 i                          i               ⎥       ⎢ ⎥
                                                                               ⎢ i =1
                                                                                          ⎥ ⎢                                   i =1             i =1                        i =1
                                                                                                                                                                                                      ⎥       ⎢ ⎥
                                                                               ⎢ 9        ⎥ ⎢ 9                                  9                    9                           9
                                                                                                                                                                                                      ⎥       ⎢a1 ⎥
                                                                               ⎢ ∑ xi y i ⎥ ⎢∑ xi                               ∑x ∑x ∑x 2                     3                      4
                                                                                                                                                                                                              ⎢ ⎥      (17)
                                                                                                                                         i                     i                      i               ⎥
   Let the set of data points (xi , yi), i = 1,2,…,9 and the curve             ⎢ i =1     ⎥ = ⎢ i =1                            i =1              i =1                        i =1                    ⎥       ⎢ ⎥
given by Y = f(x) be fitted for this data.. At x = xi, the                     ⎢ 9 2 ⎥ ⎢ 9 2                                         9                9                           9
                                                                                                                                                                                                      ⎥       ⎢a 2 ⎥
                                                                               ⎢ ∑ xi y i ⎥ ⎢ ∑ xi                              ∑x ∑x ∑x     3                   4                        5
experimental value of the ordinate is yi and the corresponding                                                                               i                   i                        i           ⎥       ⎢ ⎥
                                                                               ⎢ i =1     ⎥ ⎢ i =1                               i =1              i =1                        i =1                   ⎥       ⎢ ⎥
value on the fitting curve is f(xi). If ei is the error of                     ⎢ 9 3 ⎥ ⎢ 9 3                                         9                9                           9                   ⎥       ⎢a ⎥
                                                                               ⎢ ∑ xi y i ⎥ ⎢ ∑ xi                              ∑x ∑x ∑x                                                                      ⎢ 3⎥
                                                                                                                                             4                   5                        6
approximation at x = x , then ei = yi − f ( xi ) , then the
                              i                                                ⎣ i =1     ⎦ ⎣ i =1                               i =1
                                                                                                                                             i
                                                                                                                                                   i =1
                                                                                                                                                                 i
                                                                                                                                                                               i =1
                                                                                                                                                                                          i           ⎥
                                                                                                                                                                                                      ⎦       ⎢ ⎥
                                                                                                                                                                                                              ⎣ ⎦
summation of the square of the errors is given by



                                                                          40                                                                     http://sites.google.com/site/ijcsis/
                                                                                                                                                 ISSN 1947-5500
                                                                                                                  (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                                                 Vol. 8, No. 8, November 2010
                                                                                                                                     ⎡1    DON 21         2
                                                                                                                                                     DON 21    DON 21 ⎤
                                                                                                                                                                    3

                                                                                                                                     ⎢                                 ⎥         ⎡α 1 ⎤
Eq.(17) is a Vander monde matrix. We can also obtain the                                                                                             DON 2 2   DON 2 3 ⎥         ⎢α ⎥
                                                                                                                                     ⎢1    DON 2 2         2         2
matrix for a least squares fit by writing:                                                                                           ⎢1                    2         3 ⎥
                                                                                                                                                                                 ⎢ 2⎥
                                                                                                                                     ⎢     DON 2 3   DON 2 3   DON 2 3 ⎥         ⎢α 3 ⎥
                                                                                                                                     ⎢1                                          ⎢ ⎥
                                                                                                                                           DON 2 4   DON 2 2   DON 2 3 ⎥         ⎢α 4 ⎥
                                                  ⎡ ⎤
⎡ y1 ⎤ ⎡1                       x12      x13 ⎤                                                                                                             4         4
                          x1                                                                                                         ⎢                                 ⎥
                                                  ⎢a ⎥                                                                                                         DON 2 5 ⎥ and Y = ⎢α 5 ⎥
⎢ ⎥ ⎢1                                       ⎥                                                                                   X = ⎢1    DON 2 5         2
                                                                                                                                                     DON 2 5         3
                                  2        3
                                                  ⎢ 0⎥                                                                                                                           ⎢ ⎥
⎢ y2 ⎥ ⎢
                          x2     x2      x2 ⎥                                                                                        ⎢                                 ⎥
                                                  ⎢ ⎥                                                                                ⎢1    DON 2 6         2
                                                                                                                                                     DON 2 6   DON 2 3 ⎥         ⎢α 6 ⎥
⎢ y3 ⎥ ⎢1                                x3 ⎥
                                                                                                                                                                     6
                          x3      2
                                 x3        3
                                                                                                                                     ⎢1                              3⎥          ⎢α ⎥
                                                  ⎢ ⎥
⎢ ⎥ ⎢                                        ⎥                                                                                                             2
                                                                                                                                           DON 2 7   DON 2 7   DON 2 7           ⎢ 7⎥
                                                  ⎢a1 ⎥                                                                              ⎢                                 ⎥
⎢ y 4 ⎥ ⎢1                               x4 ⎥                                                          (18)                                                                      ⎢α 8 ⎥
                                     2     3
                          x4     x                                                                                                   ⎢1    DON 2 8         2
                                                                                                                                                     DON 2 8   DON 2 8 ⎥
                                                                                                                                                                     3
                                     4
                                                  ⎢ ⎥
⎢ y ⎥ = ⎢1                           2     3
                                             ⎥
                                                  ⎢ ⎥
                                                                                                                                     ⎢                                 ⎥         ⎢ ⎥
                                                                                                                                                                                 ⎣α 9 ⎦
⎢ 5⎥ ⎢
                          x5     x   5   x5 ⎥                                                                                        ⎢1
                                                                                                                                     ⎣     DON 2 9         2
                                                                                                                                                     DON 2 9         3
                                                                                                                                                               DON 2 9 ⎥
                                                                                                                                                                       ⎦
                                                  ⎢a2 ⎥
⎢ y ⎥ ⎢1                  x6      2
                                 x6        3⎥
                                         x6 ⎥     ⎢ ⎥
⎢ ⎥ ⎢                                             ⎢ ⎥                                                                            Finally we put the values of DON21,….DON29 to get X and
⎢ y7 ⎥ ⎢1                                x7 ⎥
                                  2        3
                          x7     x7
                                                  ⎢ a3 ⎥
⎢y ⎥ ⎢                                       ⎥
                                                  ⎢ ⎥
                                                                                                                                 put the value of α1,…..,α9 to get Y. Therefore, from Eq.(21),
⎢ 8 ⎥ ⎢1                                 x8 ⎥
                                  2        3
                          x8     x8                                                                                              we have
                                                  ⎢ ⎥
⎢ y9 ⎥ ⎢1
⎣ ⎦ ⎢                     x9      2
                                 x9        3
                                         x9 ⎥
                                             ⎥
                                                  ⎢ ⎥                                                                              ⎡a 0 ⎤ ⎡0.3019 ⎤
        ⎣                                    ⎦    ⎣ ⎦
                                                                                                                                   ⎢a ⎥ ⎢
                                                                                                                                           6.4021 ⎥
                                                                                                                                 A=⎢ ⎥=⎢           ⎥
                                                                                                                                     1
In matrix notation, Eq.(18) can be written as:
                                                                                                                                   ⎢a 2 ⎥ ⎢− 14.109⎥
                                                                                                                                   ⎢ ⎥ ⎢           ⎥
Y =XA                                                                                                  (19)                        ⎣a3 ⎦ ⎣9.8273 ⎦
                                                                                                                                                                a0 , a1 , a2   and a3
where,                                                                                                                           Now substitute the value of                              in Eq.(7)

    ⎡ y1 ⎤     ⎡1                        x1      x12     x13 ⎤               ⎡ ⎤                                                 α = 0.3019 + 6.4021× DON 2 − 14.109 × DON 22 + 9.8273 × DON 23
    ⎢y ⎥       ⎢                                             ⎥               ⎢a ⎥                                                                                                      (22)
    ⎢ 2⎥       ⎢1                        x2       2
                                                 x2        3
                                                         x2 ⎥                ⎢ 0⎥
               ⎢1                                                            ⎢ ⎥
    ⎢ y3 ⎥
               ⎢                         x3       2
                                                 x3      x3 ⎥
                                                           3
                                                             ⎥               ⎢ ⎥
                                                                                                                                 Equation (22) is the derivation of the nonlinear weighting
    ⎢ ⎥                                                                                                                          factor α and is used in Eq. 5.
    ⎢ y4 ⎥     ⎢1                        x4      x   2
                                                         x4 ⎥
                                                           3                 ⎢a1 ⎥
                                                     4
                                                                             ⎢ ⎥
Y = ⎢ y5 ⎥ X = ⎢1
               ⎢
                                                  2        3
                                                         x5 ⎥
                                                             ⎥
                                                                     and A = ⎢ ⎥                                                          IV.   EXPERIMENTAL RESULTS AND DISCUSSION
    ⎢ ⎥                                  x5      x5
    ⎢ y6 ⎥     ⎢                                     2     3⎥                ⎢a 2 ⎥                                                  The proposed noise estimation method is compared with
    ⎢y ⎥       ⎢1                        x6      x   6   x6 ⎥                ⎢ ⎥                                                 the conventional noise estimation algorithm using MVS in
    ⎢ 7⎥       ⎢1                                 2
                                                         x7 ⎥
                                                           3                 ⎢ ⎥
               ⎢
                                         x7      x7
                                                             ⎥               ⎢ a3 ⎥                                              terms of noise estimation accuracy and quality. Figures 5
    ⎢ y8 ⎥                                                                   ⎢ ⎥                                                 illustrate results of noise estimation in frequency domain (FD)
    ⎢ ⎥        ⎢1                        x8       2
                                                 x8      x8 ⎥
                                                           3

    ⎣ y9 ⎦     ⎢                                  2        3
                                                             ⎥               ⎢ ⎥                                                 measure. In the experiment, we consider the vowel phoneme
               ⎢1
               ⎣                         x9      x9      x9 ⎥⎦               ⎢ ⎥
                                                                             ⎣ ⎦                                                 sound /oy/, degraded by the white noise at 0dB SNR. It shows
                                                                                                                                 that, by adopting the proposed DON1 (Z1m), it is possible to
                                                                      T                                                          estimate the state of the added noise more precisely. We
Multiply both sides of Eq.(19) by X (transpose of X)                                                                             achieve sufficient improvements in noise amplitudes using the
                                                                                                                                 MVS+DON1 estimator. Objective measure is also performed
X T Y = X T XA                                                                                                                   to verify the quality of the estimated noise. For that we use the
                                                                                                       (20)
                                                                                                                                 PESQ MOS measure. Figure 6 shows the PESQ MOS value
                                                                                                                                 between the added and the estimated noise at different noise
This matrix equation can be solved numerically, or can be                                                                        levels. It shows that PESQ MOS value gradually decreases at
inverted directly it is well formed, to yield the solution vector                                                                the higher SNR.

A= XTX  (             )−1
                               X TY                                                                    (21)
                                                                                                                                     To study the speech enhancement performance, an
                                                                                                                                 experiment is carried out by taking 56320 samples of the clean
                                                                                                                                 speech /she had your dark suit in greasy wash water all year/
In our experiment,                                                                                                               from TIMIT database. The speech signal is corrupted by white,
                                                                                                                                 pink and HF channel noises at various SNR levels are taken
                  [
x i = DON 2 i = 0.80707 , 0.75235 , 0 .59967 , 0.32374 , 0 .095033 , 0 .01379 , − 0 .009902 , − 0.01741 , − 0 .019616   ]        from NOISEX database. The results of the average output SNR
              [
yi = α i = 1.4, 1.35, 1.25, 1.1071, 0.86429, 0.56429, 0.26429, 0.11571, 0.035                                      ]             obtained from for white noise, pink noise and HF channel noise
                                                                                                                                 at various SNR levels are given in Table 1 for NS and NWNS,
                                                                                                                                 respectively.
So, we have




                                                                                                                            41                                 http://sites.google.com/site/ijcsis/
                                                                                                                                                               ISSN 1947-5500
                                                                                                                   (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                                                  Vol. 8, No. 8, November 2010
                                  20                                                       True noise                       amplitudes by the estimated DON1. It eliminates the need for a
                  plitude (dB)
                                       0
                                                                                           Est. noise by MVS                VAD by exploiting the short time characteristics of speech
                                  -20
                                                                                                                            signals. In the result part, it is shown that the state of the added
                                                                                                                            noise is more accurate with MVS+DON1. The enhanced
                Am




                                  -40
                                           0         1        2     3        4        5       6         7      8            speech using time domain nonlinear weighted noise subtraction
                                                                   Frequency (kHz)
                                                                                                                            results in sufficient noise reduction. The main advantage of the
           Amplitude (dB)




                                  20                                              True noise
                                                                                  Est. noise by MVS+DON1                    algorithm is the effective removal of the noise components for
                                       0
                                                                                                                            a wide range of SNRs. We not only have better SNR but also a
                                  -20                                                                                       better speech quality with significantly reduced residual noise.
                                  -40
                                           0         1        2     3        4        5       6         7      8
                                                                                                                            However, a little noisy effect still remains. This issue will be
                                                                   Frequency (kHz)                                          addressed in our future study.
                                       Figure 5. Noise spectrums (original and estimated).                                                               REFERENCES
                                  4
                                                                                                                            [1]  Benesty, J., Makino, S., and Chen, J., Speech Enhancement, Springer-
                                 3.5
                                                                                                                                 Verlag Berlin Heidelberg, 2005.
                                  3
                                                                                                                            [2] Martin, R., and Lotter, T., “Optimal Recursive Smoothing of Non-
       is n e
    ISD ta c




                                 2.5

                                  2
                                                                                                                                 Stationary Periodograms”, Proc. IWAENC, pp. 167-170, Sept. 2001.
                                 1.5                                                                                        [3] Cohen, I., and Berdugo, B., “Noise Estimation by Minima Controlled
                                  1                                                                                              Recursive Averaging for Robust Speech Enhancement”, IEEE Signal
                                 0.5                                                                                             Processing Letters, vol. 9, no. 1, pp. 12-15, Jan. 2002.
                                  0
                                               -10       -5   0     5   10       15   20     25    30                       [4] Cohen, I., “Noise Estimation in Adverse Environments: Improved
                                                                    SNR (dB)
                                                                                                                                 Minima Controlled Recursive Averaging”, IEEE Trans. on Speech and
                                  Figure 6. Estimated noise quality based on PESQ MOS.                                           Audio Process., vol. 11, pp. 466-475, Sept. 2003..
                                                                                                                            [5] Doblinger, G., “Computationally Efficient Speech Enhancement by
     We observe from the Tables 3 that the overall output SNR                                                                    Spectral Minima Tracking in Subbands”, Proc. EUROSPEECH, pp.
by NS is improved up to 10 dB input SNR and degraded from                                                                        1513-1516, 1995.
15 dB and higher. Degradation occurs for overestimation of                                                                  [6] Hirsch, H. G., and Ehrlicher, C., “Noise Estimation Methods for Robust
noise within the unvoiced region of noisy speech at higher                                                                       Speech Recognition”, Proc. ICASSP, pp. 153-156, 1995.
input SNR (>10 dB). Since the unvoiced region provides flat                                                                 [7] Hamid, M. E., Ogawa, K., and Fukabayashi, T., “Noise estimation for
spectrum characteristics and exhibits low SNR gives more                                                                         Speech Enhancement by the Estimated Degree of Noise without Voice
                                                                                                                                 Activity Detection”, Proc. SIP 2006, pp. 420-424, Hawaii, August 2006.
DON2 value that increases the noise level. Consequently the
                                                                                                                            [8] Martin, R., “Speech enhancement using MMSE short time spectral
extracted noise in unvoiced region is high that is responsible to                                                                estimation with Gamma distributed speech priors”, in Proc. Int. Conf.
degrade the speech. Hence it is essential to add a weighting                                                                     Speech, Acoustics, Signal Processing, vol. I, pp. 253–256, 2002.
factor to control the overestimation and we have a better                                                                   [9] Martin, R., “Spectral Subtraction Based on Minimum Statistics”, Proc.
performance by NWNS throughout the SNR. It is observed that                                                                      EUSIPCO, pp. 1182-1185, 1994.
the enhanced speech is distorted in low voiced parts due to                                                                 [10] Martin, R., “Statistical Methods for the Enhancement of Noisy Speech”,
remove the noise in NS method whereas NWNS does not. But                                                                         Proc. IWAENC2003, pp. 1-6, 2003.
little amount of noise can be removed from the corrupted
speech by NWNS method. So in NS method there is a loss of
speech intelligibility while NWNS maintains it. We have found                                                                                       AUTHORS PROFILE
better results compared to our previous study [7] for a wide
range of SNRs.                                                                                                                                Md. Ekramul Hamid received his B.Sc
 TABLE 3: The results of average output SNR for various types of noise at                                                                     and M.Sc degree from the Department
          different input SNR by the NS and NWNS methods.                                                                                     of Applied Physics and Electronics,
                                                                                                                                              Rajshahi University, Bangladesh. After
   Input                               White noise                 HF channel noise           Pink noise
   SNR                                 NS       NWNS               NS         NWNS            NS        NWNS
                                                                                                                                              that he obtained the Masters of
   -10dB                               -2.8     -1.57              -7.4       -7.5            -7.1      -7.1                                  Computer Science degree from Pune
   -5dB                                2.0      2.4                -2.3       -2.7            -2.2      -2.3                University, India. He received his PhD degree from
   0dB                                 6.5      5.3                2.6        1.9             2.6       2.2                 Shizuoka University, Japan. During 1997-2000, he was a
   5dB                                 10.3     8.7                7.3        6.4             7.3       6.4                 lecturer in the Department of Computer Science and
   10dB                                13.3     11.7               11.5       10.8            11.3      10.8
                                                                                                                            Technology, Rajshahi University. Since 2007, he has been
   15dB                                15.4     15.8               14.5       15.4            14.4      15.4
   20dB                                16.7     20.4               16.4       20.2            16.3      20.3                serving as an associate professor in the same department.
   25dB                                17.5     25.2               17.3       25.1            17.3      25.2                He is currently working as an assistant professor in the
   30dB                                17.7     30.1               17.7       30.1            17.6      30.1                college of computer science at King Khalid University,
                                                                                                                            Abha, KSA. His research interests include speech
                                                                                                                            enhancement, and speech signal processing.
                                                                  CONCLUSIONS
    In this paper, an improved noise estimation technique is
discussed. Initially noise is estimated from the valleys of the
amplitude spectrum. Then we have adjusted the estimated noise




                                                                                                                       42                                   http://sites.google.com/site/ijcsis/
                                                                                                                                                            ISSN 1947-5500
                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                        Vol. 8, No. 8, November 2010
               Md. Zasim Uddin received his Bsc and MSc
               in Computer Science & Engineering from
               Rajshahi University, Rajshahi, Bangladesh. He
               has been awarded National Science and
               Information & Communication Technology                            Somlal Das received B.Sc (Hons) and M.Sc.
Fellowship (Government of the People's Republic of                   degrees from the Department of Applied Physics and
Bangladesh) in 2009. Currently he is a lecturer of Computer          Electronics in the University of Rajshahi, Bangladesh. He
Science & Engineering department, Dhaka International                joined as a lecturer at the Department of Computer Science
University, Dhaka, Bangladesh. His research interests include        and Engineering in the University of Rajshahi, Bangladesh, in
medical image and signal processing. He is a member of               1998. He is currently serving as an Assistant Professor and
Bangladesh Computer Society.                                         working as Ph.D. student at the same Department. His
                                                                     research interests are in speech signal processing, speech
              Md. Humayun Kabir Biswas, working as an                enhancement, speech analysis, and digital signal processing.
              international lecturer in the Department of
              Computer Science at King Khalid University,
              Kingdom of Saudi Arabia. Before joining at
              KKU, he worked as a lecturer under the
Department of Computer Science and Engineering at IUBAT-
International University of Business Agriculture and
Technology, Bangladesh. He has completed his Master of
Science in Information Technology degree from Shinawatra
University, Bangkok, Thailand. He is keen to doing research
on semantic web, intelligent information retrieval technique
and Machine Learning Technique. His current research
interest is audio and image signal processing.




                                                                43                             http://sites.google.com/site/ijcsis/
                                                                                               ISSN 1947-5500