Document Sample

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010 A New Noise Estimation Technique of Speech Signal by Degree of Noise Refinement Md. Ekramul Hamid Md. Zasim Uddin Md. Humayun Kabir Biswas Somlal Das College of Computer Science Dept. of Computer Science College of Computer Science Dept. of Computer Science King Khalid University University of Rajshahi King Khalid University University of Rajshahi Abha, Kingdom of Saudi Arabia Rajshahi, Bangladesh. Abha, Kingdom of Saudi Arabia Rajshahi, Bangladesh. e-mail:ekram_hamid@yahoo.com e-mail: cse.zasim@gmail.com e-mail: mhkbiswas@yahoo.com e-mail:somlal_ru@yahoo.com Abstract— An improved method for noise estimation of speech noise-only regions of the noisy speech spectrum. In the utterances which are disturbed by additive noise is presented in improved MCRA approach (Cohen 2003) [4], a different this paper. Here, we introduce degree of noise refinement of method was used to track the noise-only regions of the minima value sequence (MVS) and some additional techniques spectrum based on the estimated speech-presence probability. for noise estimation. Initially, noise is estimated from the valleys Doblinger [5] updated the noise estimate by continuously of the spectrum based on the harmonic properties of noisy tracking the minimum of the noisy speech in each frequency speech, called MVS. However, the valleys of the spectrum are not bin. As such, it is computationally more efficient than the pronounced enough to warrant reliable noise estimates. We, method in Martin 2001. However, it fails to differentiate therefore, initially use the estimated Degree of Noise (DON) to between an increase in noise floor and increase in speech adjust the MVS level. For every English phoneme DON is calculated and averaged within those processing frames for the power. Hirsch and Ehrlicher [6] updated the noise estimate by each input SNR. We consider this calculated average DONs as comparing the noisy speech power spectrum to the past noise standard value corresponding to the input SNR which is aligned estimate. This method fails to update the noise when the noise with the true DON using the least-squares (LS) method results a floor increases abruptly and stays at that level. In our previous function to estimate the degree of noise. Therefore, using the study, Hamid (2007) [7] proposed the noise estimation by technique, it is possible to estimate the state of the added noise using the MVS. The noise floor is updated with the help of more accurately. We use two stage refinements of estimated DON estimated DON. Here DON is estimated on the basis of pitch to update the MVS as well as to estimate a nonlinear weight for and the pitch of unvoiced sections is not accurately estimated. noise subtraction. The performance of the proposed noise estimation is good when it is integrated with the speech In this paper, we propose a method which has good noise enhancement technique. tracking and controlling capability. To estimate noise, first we search for the valleys of the amplitude spectrum on a frame by Keywords-component; Noise Estimation, the Degree of Noise, frame basis and estimate minima values of the spectrum, called Speech Enhancement, Nonlinear Weighted Noise Subtraction minima value sequence (MVS). To improve the estimation accuracy of MVS, we use DON. As it is a single-channel I. INTRODUCTION method, direct estimation of the degree of noise is not possible. Noise estimation is one of the most important aspects for For that, frame wise averaged DON is estimated from the single channel speech enhancement. Usually in single-channel estimated noise of the observed signal. We have considered speech enhancement systems, most algorithms require a voice these DONs as standard value corresponding to the input SNR. activity detector (VAD) and the speech/pause detection plays Then each of these estimated 1st averaged DONs for the major role in the performance of the whole system. corresponding input SNR is aligned with the true DON using However, these systems can perform well for voiced speech the least-squares (LS) method results the 1st estimated degree and high signal-to-noise ratio (SNR), but their performance of noise (DON1) of that frame. The 1st estimated DON1 is degrades with unvoiced speech in low SNR. applied to update the MVS. Next, the noise level is re- estimated and from the estimated noise, we again estimate 2nd Traditional noise estimators are based on voice activity averaged DON and similarly get the 2nd estimated DON2. We detectors (VAD) which are difficult to tune and their used the 2nd estimated DON2 to estimate the weight for noise application to low SNR speech results often in clipped speech. subtraction process. Because noise is estimated from the The original MMSE-STSA estimates the noise power spectrum estimated DONs, which is obtained from the true DON, so it is on the basis of the noisy speech only in the first non-speech possible to estimate noise amplitudes in more accurate form period where the pure noise is available [1]. However, these with lower speech distortion and able to suppress musical noise systems can perform well only for voiced speech and high in the enhanced speech. SNR. Martin (2001) proposed a method for estimating the noise spectrum based on tracking the minimum of the noisy. II. PROPOSED NOISE ESTIMATION METHOD The main drawback of this method is it fails to update the noise We have assumed that speech and noise are uncorrelated to spectrum when the noise floor increases abruptly [2]. Cohen each other. Let y(n)=s(n)+d(n), where y(n) is the observed (2002) [3] proposed a minima controlled recursive algorithm noisy speech signal, s(n) is the clean speech signal and d(n) (MCRA) which updates the noise estimate by tracking the 37 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010 is the additive noise. We further assume that signal and noise are statistically independent. Under the above assumptions, we can write the powers as Py=Ps+Pd. A. Estimation of the minima value sequence (MVS) The sections of consecutive samples are used as a single frame l (320 samples). Consecutive frames are spaced l’ (100 samples) achieving an almost 62.75% overlap between them. The short-term representation of a signal y(n) is obtained by windowing (Hamming window) and analyzed using N=512 point discrete-Fourier transform (DFT) in sampling frequency 16KHz. Initially, noise spectrum is estimated from the valleys of the amplitude spectrum and we assume that the peaks correspond to voice parts and valleys are the noise only parts. The algorithm for noise estimation is as follows: Figure 1. Block diagram of the 2nd estimated DON1, Z1m . 1. Compute the RMS value Yrms of the amplitude spectrum Y(k). We detect the minima 1st averaged estimated DON Z1m Ymin ( k min ) ← min(Y (k )) values of Y(k) when the 1 M P (m) following condition (Y(k)<Y(k-1) and Y(k)<Y(k+1) and Y(k)<Yrms) is satisfied. The kmin expresses the positions Z1m = M ∑ Pη (m) m=1 obs (2) of the frequency bin index of minima values. where, M are the noise added frames; Pη(m)and Pobs(m) are 2. Interpolate between adjoining minima positions the powers of noise and observed signals, respectively. Here it ( k min ← k ) to obtain the minimum value sequences obvious that we consider only the voiced phonemes in our (MVS) Ymin(k). experiment. So the averaged DON value should be limited to voiced portion of a speech sentence. But practically the 3. We smooth the sequences by taking partial average unvoiced portion contaminated with higher degree of noise. called smoothed minimum value sequences (SMVS). Hence the estimated noise is higher for unvoiced frame than This process continuously updates the estimation of from voiced frame. Consequently higher DON value is noise among every analysis frames. obtained from unvoiced frame than from voiced frame that is An estimation of noise from the SMVS is survived by an logically resemblance. The degree of noise estimated from a overestimation and underestimation of the SNR. To achieve previously prepared function using least square method is given good tracking capability with controlled overestimation by [7] problem, the proposed noise estimation algorithm adopting the Z 1m = a × Z 1m + b concept of DON. The block diagram of the noise estimation (3) process is given in Figure 1. where Z1m is the 1st estimated DON1 of frame m. The error between the true and the estimated values can be minimized by B. Estimation of the Degree of Noise (DON) tuning a, b. In the experiment, 20 phoneme sounds for 3 male In a single-channel method, we only know the power of the and 3 female degraded by the white noise in different SNRs observed signal. Therefore, direct estimation of the degree of (-10,-5,0,….,30 dB) is considered. The value of Z1m is applied noise ( Pd / Pobs ) is not possible. For that, frame wise DON is to update the MVS. Next, the noise level is re-estimated with estimated from the estimated noise of the observed signal of the help of Z1m. Finally, from the estimated noise, we again each frame m. For optimal estimation of DON, we are carried estimate 2nd averaged DON ( Z2m ) and similarly the 2nd out our experiment on 20 vowel phonemes of 3 male and 3 estimated DON2 (Z2m) which is used to estimate the noise female taken from TIMIT database. First white noise of various weight for nonlinear weighted noise subtraction. SNR are added to these voiced vowel phonemes. Then for each SNR white noisy phonemes are processed frame wise and We conduct an experiment on the noisy speech (white DON is estimated in each frame for each phoneme noise) utterance /water/ of a female speaker of various input individually. For every phoneme DON is averaged within those SNRs and obtain the 1st estimated DON1, Z1m and 2nd estimated processing frames for the corresponding input SNR. Then each DON2, Z2m. Figure 2 illustrates the frame wise true degree of of these estimated 1st averaged DONs of each frame m for noise calculated and the estimated degree of noise obtained in corresponding input SNR expressed as Z1m . This Z1m is aligned every analysis frame for different input SNRs. By adopting with the true DON (Ztr) using the least-squares (LS) method smoothing in the MVS, the overestimation problem is results the 1st estimated degree of noise (DON1) Z1m of that minimized and the effect of musical noise is reduced. In fact frame. The true DON (Ztr) is given by smoothing is performed to reduce the high frequency fluctuations. Since for speech most of the signal energy is Pd 1 concentrated in low frequencies, for that reason smoothing is Ztr = = Ps + Pd dB reducing the high frequency components and gives increased 1 + 10 10 (1) 38 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010 signal-to-noise ratio. The Fig. 3 shows, the true and the introduce a nonlinear weighting factor to control the estimated degree of noise are almost equal in all SNRs. overestimation and minimizing the effect of residual noise. The NWNS is given by: ) s1(n) = y(n) − α × Ztr × dss(n) (5) where α = 0.3019 + 6.4021× Z 2m − 14.109 × Z + 9.8273 × Z 2 m 2 3 2m is nonlinear weighting factor. It is observed from Eq. (5) that it needs the input SNR. The input SNR can be estimated using variance is given by ⎛σ 2 ⎞ SNR input = 10 log 10 ⎜ s2 ⎟ ⎜σ ⎟ ⎝ η ⎠ (6) Figure 2a. True vs 1st avg. DON (T) Figure 2b. True vs 2nd avg. DON (T) σ 2 where, σ s and η are the variances of speech and noise, 2 and True vs 1st estimated DON1 (B). and True vs 2nd estimated DON2 (B). respectively. We assume that due to the independency of noise and speech, the variance of the noisy speech is equal to the sum of the speech variance and noise variance. It is found that by adopting nonlinear weighted in NS, a good noise reduction is obtained. Although with the NWNS, we find the good performance with less musical noise by informal listening test. Figure 3. Frame wise graphical representations of the true (solid with point) and the 1st estimated DON1 (dotted line with circle) and 2nd estimated DON2 (solid line with double linewidth) for –5dB (left) and , 5dB (right) SNR noisy speech. C. Estimation of Noise Spectrum The noise spectrum is estimated from the SMVS and 1st estimated DON according to the condition ( Dm (k ) = Ymin (k ) + Z1m × Yrms ) (4) Then we made some updates of Dm(k), the updated spectrum is again smoothed by three point moving average, and lastly the main maximum of the spectrum is identified and are suppressed [7]. III. WEIGHTED NOISE SUBTRACTION (NWNS) Figure 4. The depictions spectrums of voiced and unvoiced frames degraded by Noise reduction based on implementation of the traditional white noise at 5dB SNR is shown in (a) and (b), 10dB SNR is shown (c) and spectral subtraction (SS) require an available estimation of the (d), 20dB SNR is shown in (e) and (f), 30dB SNR is shown in (g) and (h) embedded noise, here, in time domain we named Noise respectively. Subtraction (NS). It is observed that, in NS, degradation occurs A. Derievation of non linear weight for overestimation of noise within the unvoiced region of noisy speech at higher input SNR (>10 dB). We manually seen that It is observed that the outcome of the subtraction type the unvoiced region provides flat spectrum characteristics and algorithms produce musical noise and that cannot be avoided. exhibits low SNR that gives more degree of noise value that Since algorithms with fixed subtraction parameters are unable increases the noise level. Therefore, the extracted noise in to adapt well to the varying noise levels and characteristics, unvoiced region is high and degrades the speech. From Figure therefore it becomes imperative to estimate a suitable factor to 4, it is seen that the unvoiced frame of higher SNR (>10 dB) update the noise level. Hence we derive a nonlinear weighting input noisy speech provides flat spectrum and low SNR that factor α for this purpose. First, simulation is performed over 7 gives more DON2 (Z2m) value that increases weighting factor. males and 7 females speakers of different sentences at So more noise has subtracted at every unvoiced frame than different SNR levels, randomly selected from the TIMIT from every voiced frame, say at 25 dB SNR input speech. database, for different values of α and record the output SNR. Consequently speech distortion has to be occurred. For that, we 39 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010 Table 1 shows the performance of computer simulation of the 9 algorithm of a given noisy sentence of a female speaker for SE = ∑ ei2 different values of α. i =1 TABLE 1: The output SNR for a noisy speech of a female speaker for different values of α. for wide range of input SNR (-10dB to 30dB). The We consider α is a polynomial of degree 3. speech is degraded by white noise nose. Then the 3rd degree polynomials are: α = f ( x) = a0 + a1 x + a2 x 2 + a3 x 3 } (7) be fitted to the data points (xi , yi), i = 1,2,….,9. x represents the values of DON2. The summation of errors at x = xi is given by [ ( )] 9 SE ≡ ∑ yi − a0 + a1 xi + a 2 xi + a3 xi 2 3 2 i =1 (8) For SE to be minimum, we have 2 = −2∑ [yi − (a0 + a1 xi + a2 xi2 + a3 xi3 )] = 0 ∂(SE) 9 ∂a0 i =1 (9) 2 ∂ ( SE ) [ ( )] x 9 = −2∑ yi − a0 + a1 xi + a2 xi2 + a3 xi3 =0 (10) ∂a1 i i =1 2 ∂( SE) [ ( )] x 9 = −2∑ yi − a0 + a1 xi + a2 xi2 + a3 xi3 2 =0 ∂a2 i i =1 (11) 2 ∂ ( SE ) [ ( )] x 9 = −2∑ yi − a 0 + a1 xi + a 2 xi2 + a3 xi3 3 =0 ∂a3 i i =1 (12) From Eq. (9), (10), (11) and (12) we have, 9 9 9 9 ∑y i = 9 a 0 + a1 ∑ x i + a 2 ∑ x i2 + a 3 ∑ x i3 TABLE 2: The average weight of α for 7 male and 7 female utterances i =1 i =1 i =1 i =1 (13) corresponding to wide range of input SNR (-10dB to 30dB). 9 9 9 9 9 ∑x y i i = a 0 ∑ xi + a1 ∑ x + a 2 ∑ x + a3 ∑ x2 i 3 i 4 i i =1 i =1 i =1 i =1 i =1 (14) 9 9 9 9 9 ∑x 2 i y i = a0 ∑ x + a1 ∑ x + a 2 ∑ x + a3 ∑ x 2 i 3 i 4 i 5 i i =1 i =1 i =1 i =1 i =1 (15) 9 9 9 9 9 ∑x 3 i yi = a0 ∑ xi3 + a1 ∑ xi4 + a2 ∑ xi5 + a3 ∑ xi6 i =1 i =1 i =1 i =1 i =1 (16) We write these equations in a matrix form as: ⎡ 9 ⎤ ⎡ 9 9 9 ⎤ ⎡a0 ⎤ ⎢∑ yi ⎥ ⎢9 ∑x ∑x ∑x 2 3 i i i ⎥ ⎢ ⎥ ⎢ i =1 ⎥ ⎢ i =1 i =1 i =1 ⎥ ⎢ ⎥ ⎢ 9 ⎥ ⎢ 9 9 9 9 ⎥ ⎢a1 ⎥ ⎢ ∑ xi y i ⎥ ⎢∑ xi ∑x ∑x ∑x 2 3 4 ⎢ ⎥ (17) i i i ⎥ Let the set of data points (xi , yi), i = 1,2,…,9 and the curve ⎢ i =1 ⎥ = ⎢ i =1 i =1 i =1 i =1 ⎥ ⎢ ⎥ given by Y = f(x) be fitted for this data.. At x = xi, the ⎢ 9 2 ⎥ ⎢ 9 2 9 9 9 ⎥ ⎢a 2 ⎥ ⎢ ∑ xi y i ⎥ ⎢ ∑ xi ∑x ∑x ∑x 3 4 5 experimental value of the ordinate is yi and the corresponding i i i ⎥ ⎢ ⎥ ⎢ i =1 ⎥ ⎢ i =1 i =1 i =1 i =1 ⎥ ⎢ ⎥ value on the fitting curve is f(xi). If ei is the error of ⎢ 9 3 ⎥ ⎢ 9 3 9 9 9 ⎥ ⎢a ⎥ ⎢ ∑ xi y i ⎥ ⎢ ∑ xi ∑x ∑x ∑x ⎢ 3⎥ 4 5 6 approximation at x = x , then ei = yi − f ( xi ) , then the i ⎣ i =1 ⎦ ⎣ i =1 i =1 i i =1 i i =1 i ⎥ ⎦ ⎢ ⎥ ⎣ ⎦ summation of the square of the errors is given by 40 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010 ⎡1 DON 21 2 DON 21 DON 21 ⎤ 3 ⎢ ⎥ ⎡α 1 ⎤ Eq.(17) is a Vander monde matrix. We can also obtain the DON 2 2 DON 2 3 ⎥ ⎢α ⎥ ⎢1 DON 2 2 2 2 matrix for a least squares fit by writing: ⎢1 2 3 ⎥ ⎢ 2⎥ ⎢ DON 2 3 DON 2 3 DON 2 3 ⎥ ⎢α 3 ⎥ ⎢1 ⎢ ⎥ DON 2 4 DON 2 2 DON 2 3 ⎥ ⎢α 4 ⎥ ⎡ ⎤ ⎡ y1 ⎤ ⎡1 x12 x13 ⎤ 4 4 x1 ⎢ ⎥ ⎢a ⎥ DON 2 5 ⎥ and Y = ⎢α 5 ⎥ ⎢ ⎥ ⎢1 ⎥ X = ⎢1 DON 2 5 2 DON 2 5 3 2 3 ⎢ 0⎥ ⎢ ⎥ ⎢ y2 ⎥ ⎢ x2 x2 x2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢1 DON 2 6 2 DON 2 6 DON 2 3 ⎥ ⎢α 6 ⎥ ⎢ y3 ⎥ ⎢1 x3 ⎥ 6 x3 2 x3 3 ⎢1 3⎥ ⎢α ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2 DON 2 7 DON 2 7 DON 2 7 ⎢ 7⎥ ⎢a1 ⎥ ⎢ ⎥ ⎢ y 4 ⎥ ⎢1 x4 ⎥ (18) ⎢α 8 ⎥ 2 3 x4 x ⎢1 DON 2 8 2 DON 2 8 DON 2 8 ⎥ 3 4 ⎢ ⎥ ⎢ y ⎥ = ⎢1 2 3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣α 9 ⎦ ⎢ 5⎥ ⎢ x5 x 5 x5 ⎥ ⎢1 ⎣ DON 2 9 2 DON 2 9 3 DON 2 9 ⎥ ⎦ ⎢a2 ⎥ ⎢ y ⎥ ⎢1 x6 2 x6 3⎥ x6 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ Finally we put the values of DON21,….DON29 to get X and ⎢ y7 ⎥ ⎢1 x7 ⎥ 2 3 x7 x7 ⎢ a3 ⎥ ⎢y ⎥ ⎢ ⎥ ⎢ ⎥ put the value of α1,…..,α9 to get Y. Therefore, from Eq.(21), ⎢ 8 ⎥ ⎢1 x8 ⎥ 2 3 x8 x8 we have ⎢ ⎥ ⎢ y9 ⎥ ⎢1 ⎣ ⎦ ⎢ x9 2 x9 3 x9 ⎥ ⎥ ⎢ ⎥ ⎡a 0 ⎤ ⎡0.3019 ⎤ ⎣ ⎦ ⎣ ⎦ ⎢a ⎥ ⎢ 6.4021 ⎥ A=⎢ ⎥=⎢ ⎥ 1 In matrix notation, Eq.(18) can be written as: ⎢a 2 ⎥ ⎢− 14.109⎥ ⎢ ⎥ ⎢ ⎥ Y =XA (19) ⎣a3 ⎦ ⎣9.8273 ⎦ a0 , a1 , a2 and a3 where, Now substitute the value of in Eq.(7) ⎡ y1 ⎤ ⎡1 x1 x12 x13 ⎤ ⎡ ⎤ α = 0.3019 + 6.4021× DON 2 − 14.109 × DON 22 + 9.8273 × DON 23 ⎢y ⎥ ⎢ ⎥ ⎢a ⎥ (22) ⎢ 2⎥ ⎢1 x2 2 x2 3 x2 ⎥ ⎢ 0⎥ ⎢1 ⎢ ⎥ ⎢ y3 ⎥ ⎢ x3 2 x3 x3 ⎥ 3 ⎥ ⎢ ⎥ Equation (22) is the derivation of the nonlinear weighting ⎢ ⎥ factor α and is used in Eq. 5. ⎢ y4 ⎥ ⎢1 x4 x 2 x4 ⎥ 3 ⎢a1 ⎥ 4 ⎢ ⎥ Y = ⎢ y5 ⎥ X = ⎢1 ⎢ 2 3 x5 ⎥ ⎥ and A = ⎢ ⎥ IV. EXPERIMENTAL RESULTS AND DISCUSSION ⎢ ⎥ x5 x5 ⎢ y6 ⎥ ⎢ 2 3⎥ ⎢a 2 ⎥ The proposed noise estimation method is compared with ⎢y ⎥ ⎢1 x6 x 6 x6 ⎥ ⎢ ⎥ the conventional noise estimation algorithm using MVS in ⎢ 7⎥ ⎢1 2 x7 ⎥ 3 ⎢ ⎥ ⎢ x7 x7 ⎥ ⎢ a3 ⎥ terms of noise estimation accuracy and quality. Figures 5 ⎢ y8 ⎥ ⎢ ⎥ illustrate results of noise estimation in frequency domain (FD) ⎢ ⎥ ⎢1 x8 2 x8 x8 ⎥ 3 ⎣ y9 ⎦ ⎢ 2 3 ⎥ ⎢ ⎥ measure. In the experiment, we consider the vowel phoneme ⎢1 ⎣ x9 x9 x9 ⎥⎦ ⎢ ⎥ ⎣ ⎦ sound /oy/, degraded by the white noise at 0dB SNR. It shows that, by adopting the proposed DON1 (Z1m), it is possible to T estimate the state of the added noise more precisely. We Multiply both sides of Eq.(19) by X (transpose of X) achieve sufficient improvements in noise amplitudes using the MVS+DON1 estimator. Objective measure is also performed X T Y = X T XA to verify the quality of the estimated noise. For that we use the (20) PESQ MOS measure. Figure 6 shows the PESQ MOS value between the added and the estimated noise at different noise This matrix equation can be solved numerically, or can be levels. It shows that PESQ MOS value gradually decreases at inverted directly it is well formed, to yield the solution vector the higher SNR. A= XTX ( )−1 X TY (21) To study the speech enhancement performance, an experiment is carried out by taking 56320 samples of the clean speech /she had your dark suit in greasy wash water all year/ In our experiment, from TIMIT database. The speech signal is corrupted by white, pink and HF channel noises at various SNR levels are taken [ x i = DON 2 i = 0.80707 , 0.75235 , 0 .59967 , 0.32374 , 0 .095033 , 0 .01379 , − 0 .009902 , − 0.01741 , − 0 .019616 ] from NOISEX database. The results of the average output SNR [ yi = α i = 1.4, 1.35, 1.25, 1.1071, 0.86429, 0.56429, 0.26429, 0.11571, 0.035 ] obtained from for white noise, pink noise and HF channel noise at various SNR levels are given in Table 1 for NS and NWNS, respectively. So, we have 41 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010 20 True noise amplitudes by the estimated DON1. It eliminates the need for a plitude (dB) 0 Est. noise by MVS VAD by exploiting the short time characteristics of speech -20 signals. In the result part, it is shown that the state of the added noise is more accurate with MVS+DON1. The enhanced Am -40 0 1 2 3 4 5 6 7 8 speech using time domain nonlinear weighted noise subtraction Frequency (kHz) results in sufficient noise reduction. The main advantage of the Amplitude (dB) 20 True noise Est. noise by MVS+DON1 algorithm is the effective removal of the noise components for 0 a wide range of SNRs. We not only have better SNR but also a -20 better speech quality with significantly reduced residual noise. -40 0 1 2 3 4 5 6 7 8 However, a little noisy effect still remains. This issue will be Frequency (kHz) addressed in our future study. Figure 5. Noise spectrums (original and estimated). REFERENCES 4 [1] Benesty, J., Makino, S., and Chen, J., Speech Enhancement, Springer- 3.5 Verlag Berlin Heidelberg, 2005. 3 [2] Martin, R., and Lotter, T., “Optimal Recursive Smoothing of Non- is n e ISD ta c 2.5 2 Stationary Periodograms”, Proc. IWAENC, pp. 167-170, Sept. 2001. 1.5 [3] Cohen, I., and Berdugo, B., “Noise Estimation by Minima Controlled 1 Recursive Averaging for Robust Speech Enhancement”, IEEE Signal 0.5 Processing Letters, vol. 9, no. 1, pp. 12-15, Jan. 2002. 0 -10 -5 0 5 10 15 20 25 30 [4] Cohen, I., “Noise Estimation in Adverse Environments: Improved SNR (dB) Minima Controlled Recursive Averaging”, IEEE Trans. on Speech and Figure 6. Estimated noise quality based on PESQ MOS. Audio Process., vol. 11, pp. 466-475, Sept. 2003.. [5] Doblinger, G., “Computationally Efficient Speech Enhancement by We observe from the Tables 3 that the overall output SNR Spectral Minima Tracking in Subbands”, Proc. EUROSPEECH, pp. by NS is improved up to 10 dB input SNR and degraded from 1513-1516, 1995. 15 dB and higher. Degradation occurs for overestimation of [6] Hirsch, H. G., and Ehrlicher, C., “Noise Estimation Methods for Robust noise within the unvoiced region of noisy speech at higher Speech Recognition”, Proc. ICASSP, pp. 153-156, 1995. input SNR (>10 dB). Since the unvoiced region provides flat [7] Hamid, M. E., Ogawa, K., and Fukabayashi, T., “Noise estimation for spectrum characteristics and exhibits low SNR gives more Speech Enhancement by the Estimated Degree of Noise without Voice Activity Detection”, Proc. SIP 2006, pp. 420-424, Hawaii, August 2006. DON2 value that increases the noise level. Consequently the [8] Martin, R., “Speech enhancement using MMSE short time spectral extracted noise in unvoiced region is high that is responsible to estimation with Gamma distributed speech priors”, in Proc. Int. Conf. degrade the speech. Hence it is essential to add a weighting Speech, Acoustics, Signal Processing, vol. I, pp. 253–256, 2002. factor to control the overestimation and we have a better [9] Martin, R., “Spectral Subtraction Based on Minimum Statistics”, Proc. performance by NWNS throughout the SNR. It is observed that EUSIPCO, pp. 1182-1185, 1994. the enhanced speech is distorted in low voiced parts due to [10] Martin, R., “Statistical Methods for the Enhancement of Noisy Speech”, remove the noise in NS method whereas NWNS does not. But Proc. IWAENC2003, pp. 1-6, 2003. little amount of noise can be removed from the corrupted speech by NWNS method. So in NS method there is a loss of speech intelligibility while NWNS maintains it. We have found AUTHORS PROFILE better results compared to our previous study [7] for a wide range of SNRs. Md. Ekramul Hamid received his B.Sc TABLE 3: The results of average output SNR for various types of noise at and M.Sc degree from the Department different input SNR by the NS and NWNS methods. of Applied Physics and Electronics, Rajshahi University, Bangladesh. After Input White noise HF channel noise Pink noise SNR NS NWNS NS NWNS NS NWNS that he obtained the Masters of -10dB -2.8 -1.57 -7.4 -7.5 -7.1 -7.1 Computer Science degree from Pune -5dB 2.0 2.4 -2.3 -2.7 -2.2 -2.3 University, India. He received his PhD degree from 0dB 6.5 5.3 2.6 1.9 2.6 2.2 Shizuoka University, Japan. During 1997-2000, he was a 5dB 10.3 8.7 7.3 6.4 7.3 6.4 lecturer in the Department of Computer Science and 10dB 13.3 11.7 11.5 10.8 11.3 10.8 Technology, Rajshahi University. Since 2007, he has been 15dB 15.4 15.8 14.5 15.4 14.4 15.4 20dB 16.7 20.4 16.4 20.2 16.3 20.3 serving as an associate professor in the same department. 25dB 17.5 25.2 17.3 25.1 17.3 25.2 He is currently working as an assistant professor in the 30dB 17.7 30.1 17.7 30.1 17.6 30.1 college of computer science at King Khalid University, Abha, KSA. His research interests include speech enhancement, and speech signal processing. CONCLUSIONS In this paper, an improved noise estimation technique is discussed. Initially noise is estimated from the valleys of the amplitude spectrum. Then we have adjusted the estimated noise 42 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010 Md. Zasim Uddin received his Bsc and MSc in Computer Science & Engineering from Rajshahi University, Rajshahi, Bangladesh. He has been awarded National Science and Information & Communication Technology Somlal Das received B.Sc (Hons) and M.Sc. Fellowship (Government of the People's Republic of degrees from the Department of Applied Physics and Bangladesh) in 2009. Currently he is a lecturer of Computer Electronics in the University of Rajshahi, Bangladesh. He Science & Engineering department, Dhaka International joined as a lecturer at the Department of Computer Science University, Dhaka, Bangladesh. His research interests include and Engineering in the University of Rajshahi, Bangladesh, in medical image and signal processing. He is a member of 1998. He is currently serving as an Assistant Professor and Bangladesh Computer Society. working as Ph.D. student at the same Department. His research interests are in speech signal processing, speech Md. Humayun Kabir Biswas, working as an enhancement, speech analysis, and digital signal processing. international lecturer in the Department of Computer Science at King Khalid University, Kingdom of Saudi Arabia. Before joining at KKU, he worked as a lecturer under the Department of Computer Science and Engineering at IUBAT- International University of Business Agriculture and Technology, Bangladesh. He has completed his Master of Science in Information Technology degree from Shinawatra University, Bangkok, Thailand. He is keen to doing research on semantic web, intelligent information retrieval technique and Machine Learning Technique. His current research interest is audio and image signal processing. 43 http://sites.google.com/site/ijcsis/ ISSN 1947-5500

DOCUMENT INFO

Shared By:

Categories:

Tags:
IJCSIS, call for paper, journal computer science, research, google scholar, IEEE, Scirus, download, ArXiV, library, information security, internet, peer review, scribd, docstoc, cornell university, archive, Journal of Computing, DOAJ, Open Access, November 2010, Volume 8, No.8, Impact Factor, engineering, international, proQuest, computing, computer, technology

Stats:

views: | 131 |

posted: | 12/4/2010 |

language: | English |

pages: | 7 |

OTHER DOCS BY ijcsis

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.