VIEWS: 33 PAGES: 4 POSTED ON: 8/20/2012
BLIND SOURCE SEPARATION OF FIR CONVOLUTIVE MIXTURES: APPLICATION TO SPEECH SIGNALS e e C´ dric F´ votte, Alexandra Debiolles, Christian Doncarli e Institut de Recherche en Communications et Cybern´ tique de Nantes (IRCCyN) e UMR CNRS 6597, 1 rue de la No¨ , BP 92101, 44321 Nantes Cedex 03, France fevotte@irccyn.ec-nantes.fr ABSTRACT eral whitened covariance matrices. The method assumes that the sources are stationary but still performs well on In this paper we present a simple method to deal with audio signals as long as they are uncorrelated. Some time- Blind Source Separation (BSS) of Finite Impulse Response frequency approaches were proposed by several authors to (FIR) convolutive mixtures. The global method proceeds deal with non-stationary signals but we will not address in two steps. The ﬁrst step consists in separating each such a level of generality here (see [3] for an overview of source contribution in the mixture. This step provides blind separation methods for convolutive mixtures using several ﬁltered version of each source. The second step block-diagonalization). consists in retrieving the original sources from the set of The second step has been widely investigated in the lit- ﬁltered versions of each source using a blind system iden- erature. An overview of blind system identiﬁcation (BSI) tiﬁcation method. We present some results on a mixture methods is available in [4]. We will brieﬂy describe the of speech and music. method based on subspace decomposition presented in [5]. The only novelty in this paper is the combination of 1. INTRODUCTION the ﬁrst step (SOBI convolutive) and the second step (BSI) to perform complete separation instead of partial separa- Blind Source Separation has many applications in Au- tion as in [2]. dio Signal Processing (see [1] for an overview of Audio In Section 2 we introduce assumptions and notations, Source Separation applications). Usually we have to deal and we show how the convolutive mixing can be turned with convolutive mixtures, for example to take into ac- into an instantaneous mixing. In Sections 3 and 4 we count the reverberation in a room. present brieﬂy the source contributions separation step and In this paper we aim at presenting how the separation the blind identiﬁcation step. An illustration of the perfor- of a Finite Impulse Response (FIR) convolutive mixture mance of the method over a mixture of speech and guitar can be tackled with the use of a (joint) block-diagonalization is presented in Section 5. procedure. The method proceeds in two steps. First step consists in separating each source contri- 2. BACKGROUND bution to the mixture. The convolutive mixing is rear- ranged into a multiplicative mixing of new sources intro- 2.1. Aim and assumptions ducing proper variables. But some of these new sources are dependent. However, the contributions of the different We consider the following discrete-time noiseless FIR sources can be separated by using the standard BSS algo- MIMO model: rithm SOBI extended for the particular case when some of the sources are dependent. However, the algorithm do not x[t] = H[0] s[t] + H[1] s[t − 1] + . . .+ H[L] s[t − L] (1) provide the original sources but several ﬁltered version of each source. where x[t] = [x1 [t], . . . , xm [t]]T is the vector of size m T Then, the second step consists in recovering the orig- containing the observations, s[t] = [s1 [t], . . . , sn [t]] is inal sources from the set a of ﬁltered versions of each the vector of size n containing the sources (assumed zero- source obtained from the ﬁrst step. In the literature, this mean and mutually uncorrelated at every time instant), problem is usually named blind system identiﬁcation or H[k] = {hij [k]}, k = 0 . . . L, are m × n matrices with blind deconvolution. m > n. Each step of the global BSS method was published The overall objective of BSS is to obtain estimates separately by several authors. The ﬁrst step (extension of of the mixing ﬁlters and/or estimates of the sources up SOBI to convolutive mixtures) is presented in [2]. The al- to standard BSS indeterminacies on ordering, scale and gorithm is based on the joint block-diagonalization of sev- phase. 2.2. Back to instantaneous mixing environment, the reader should refer to [3] and references therein concerning A. Belouchrani and K. Abed-Meraim We recall from [2] how the convolutive mixing (1) can be work on the topic. The key of the method is to formulate rearranged into an instantaneous mixing. the overall problem described by Eq. (2) in the time-lag plane. 2.3. Notations For (t, τ ) ∈ Z2 we note RSS [t, τ ] the covariance ma- Let L be an integer such that mL ≥ n(L + L ) (L exists trix of S[t] deﬁned by: when m > n). We note, for i = 1, . . . , n: def H T RSS [t, τ ] = E{S[t] S[t + τ ] } Si [t] = [si [t], . . . , si [t − (L + L ) + 1]] Since the sources are assumed stationary we have: and for j = 1, . . . , m Xj [t] = [xj [t], . . . , xj [t − L + 1]] T RSS [t, τ ] = RSS [τ ] (3) where ·T denotes “transpose”. Then we introduce: The vector signals S1 [t], . . . , Sn [t] being mutually uncor- T related, the N × N covariance matrix RSS [τ ] is block- T T S[t] = S1 [t] , . . . , Sn [t] diagonal with n blocks of dimensions (L + L ), such that: T T T X[t] = X1 [t] , . . . , Xm [t] RS1 S1 [τ ] RSS [τ ] = .. . RSn Sn [τ ] ∀t, S[t] is a column vector of size n(L + L ) and X[t] is a column vector of size mL . For simplicity we note With Eq. (2) we have: N = n(L + L ) and M = mL . For i = 1, . . . , n and j = 1, . . . , m we note Aij the RXX [τ ] = A RSS [τ ] AH (4) following L × (L + L ) Sylvester matrix: hij [0] . . . hij [L] 0 ... 0 3.1. Generalization of SOBI .. .. . . A two-steps separation method (whitening and rotation) Aij = .. .. can be devised from (4) [2, 3]. . . 0 ... 0 hij [0] . . . hij [L] 3.1.1. Whitening Finally, we note: A11 ... A1n In the FIR convolutive case, whitening consists in ﬁnding . . a matrix W of dimensions N × M such that: A= . . . . A1m . . . Amn W A B AH W H = I M (5) A is a M × N matrix which satisﬁes: where B is a N ×N bloc-diagonal matrix positive deﬁnite X[t] = A S[t] (2) with n blocks of dimension (L + L ). In practice, using B = RSS [0], W can be computed from the eigenele- In the following we assume that A is full rank. ments of the following estimation of RXX [0] [3]: Eq. (2) shows that the convolutive mixing (1) can be written as an instantaneous mixture. Such mixtures (2) have been widely studied in BSS/ICA literature. However ˆ def 1 RXX [0] = X[t] X[t]H (6) the big difference here is that the components of S[t] are T not all mutually independent: when the sources are not ≈ ˆ A RSS [0] AH (7) white, for i = 1, . . . , n, the components of Si [t] are de- pendent. 3.1.2. Rotation 3. STEP 1: SEPARATION OF THE The second step of the method is the estimation of U = 1 CONTRIBUTIONS OF EACH SOURCE W A B 2 . It is shown in Section 3.1.3 that some estimates of the sources can be retrieved from W and U. In this section we brieﬂy describe the source contributions Let us deﬁne the following “whitened” covariance ma- separation step. We will assume that the sources are sta- trices: tionary. For a more general study, in particular in a noisy RXX [τ ] = W RXX [τ ] WH (8) With Eq. (4) we have: 4. STEP 2: BLIND IDENTIFICATION RXX [τ ] = W A RSS [τ ] AH WH We now shortly describe the subspace method presented = U (B −1 2 RSS [τ ] B −H 2 )U H (9) in [5] to handle the deconvolution of the several ﬁltered versions of each source we obtained from the ﬁrst step. 1 H Since B− 2 , B− 2 and RSS [τ ] are block-diagonal ma- Let us consider the deconvolution problem of a single source trices we see that U block-diagonalizes RXX [τ ] for all τ . d[t] = si [t]. The deconvolution problem expressed by Eq. (14) matches the following structure: Thus, U can be retrieved in theory from the block- y1 [t] = f1 [0] d[t] + . . . + f1 [Q] d[t − Q] diagonalization of any matrix RXX [τ ]. In practice an esti- mate of U should rather be computed from the joint block- . . . diagonalization (JBD) of a set of K matrices {RXX [τi ], i = 1...K}. JBD provides a more robust estimate of U yP [t] = fP [0] d[t] + . . . + fP [Q] d[t − Q] with respect to estimation errors on RXX [τ ] and reduces where P = L + L , Q = L + L − 1, yk [t] is the k th entry indeterminacies in the same way joint-diagonalization does ˆ of Si [t] and [fk [0], . . . , fk [Q]] is the k th row of Fi . Our [6]. JBD provides a matrix UJBD such that: goal is to estimate the ﬁlters parameters, that we stack in: UJBD = U P (10) T f = [f1 [0], . . . , f1 [Q], . . . , fP [0], . . . , fP [Q]] (15) where P is a N × N unitary matrix that models JBD inde- Provided the ﬁlters parameters we will be able to recover terminacies. P is the product of a block-diagonal unitary the sources with inverse ﬁltering. matrix with n blocks of dimension (L + L ) × (L + L ) with a permutation matrix of these blocks. A Jacobi-like Let W be an integer ”window parameter”. We deﬁne, JBD algorithm is presented in [7]. ∀i = 1, . . . , P : T 3.1.3. Retrieving the sources yi [t] = [yi [t], . . . , yi [t − W + 1]] In this section we compute estimates of the sources (up and: to unknown ﬁlters) from UJBD and W. We deﬁne the T ˆ following column vector S[t] of dimension N : y[t] = y1 [t]T , . . . , yP [t]T T ˆ d[t] = [d[t], . . . , d[t − W − Q + 1]] S[t] = UH W X[t] JBD (11) We deﬁne ∀i = 1, . . . , P : Eq.’s (10) and (2) yield: ˆ fi [0] . . . fi [Q] 0 ... 0 S[t] = C S[t] (12) .. .. (i) . . with: FW = .. .. def 1 . . C = P H B− 2 (13) 0 ... 0 fi [0] . . . fi [Q] C is a N ×N block-diagonal matrix with n blocks C1 , . . . , Cn of dimensions (L + L ) × (L + L ). We decompose S[t] ˆ (size W × (Q + W )) and: into n sub-vectors of dimension (L + L ) such that (1) FW T ˆ ˆ ˆ = . S[t] = S1 [t]T , . . . , Sn [t]T FW . . (16) (P ) FW Then, for i = 1, . . . , n, we have: ˆ (size P W × (Q + W )). With these notations we have: Si [t] = Ci Si [t] (14) T y[t] = FW d[t] (17) We recall that Si [t] = [si [t], . . . , si [t − (L + L ) + 1]] . Hence, Eq. (14) means that each component of Si [t] is aˆ The parameter W must chosen such as P W ≥ (Q + FIR ﬁltered version of the ith source si [t]. The coefﬁ- W ), which means that the system in Eq. (17) becomes cients of the ﬁlters are contained in corresponding rows overdetermined. The key theorem of [5] is that, if W ≥ Q of Ci . Then, for each source si [t], we retrieve (L + L ) and if FW −1 is full column rank, then the range of the ﬁltered versions of si [t]. Thus, a further blind SIMO sys- columns of FW uniquely determines f . This means that if tem identiﬁcation step is required to estimate the original we can determine the range of FW , we only have to com- sources instead of ﬁltered versions of them. pute a basis of it in the Sylvester matrix form of FW to recover f (up to scalar factor). 5 2 Speech Guitar 0 0 From Eq. (17), we have: −2 −5 Ryy [0] = FW Rdd [0] FH W (18) 2 4 6 2 4 6 4 4 x 10 x 10 5 5 5 If FW and Rdd [0] are full column rank matrices, Ryy [0] is a matrix of rank W + Q and the range of FW is sim- Mix 1 Mix 2 Mix 3 0 0 0 ply the space orthogonal to the null subspace (or noise subspace in the presence of noise) of Ryy [0]. The null −5 −5 −5 2 4 6 2 4 6 2 4 6 subspace is the range of the P W − Q − W eigenvectors x 10 4 x 10 4 x 10 4 of Ryy [0] associated to the eigenvalue 0. If EW denotes 5 Estimated Speech Estimated Guitar 2 the P W × P W − Q − W matrix containing these eigen- 0 0 vectors, f can be simply estimated as the minimizer of: −2 −5 q(f ) = EH FW W F (19) 2 4 6 2 4 6 4 4 x 10 x 10 With proper variables, q(f ) can expressed as a quadratic form in f and its minimization under the constraint f F = 1 thus amounts to the computation of an eigenvector. See Fig. 1. Evaluation of extended SOBI + BSI on a mixture [5] for full details. of speech and guitar 5. SIMULATIONS RESULTS we obtained on short length ﬁlters happened to be very good up to L = 6. The whole method strongly relies on We present some results on a noiseless mixture of two the joint block-diagonalization procedure which happened sources (one is speech, the other is electric guitar). The to fail with longer ﬁlters. matrix of mixing ﬁlters is arbitrarily chosen as: 1 + 0.8z −1 + 0.5z −2 0.8 + 0.7z −1 + 0.4z −2 7. REFERENCES H[z] = 0.9 + 0.4z −1 + 0.6z −2 1 + 0.9z −1 + 0.3z −2 e [1] E. Vincent, C. F´ votte, R. Gribonval, and al, “A tentative typology 0.7 + 0.6z −1 + 0.5z −2 0.8 + 0.3z −1 + 0.6z −2 of audio source separation tasks,” in 4th Symposium on Independent Component Analysis and Blind Source Separation (ICA’03), Nara, The sources, mixtures and estimated sources are presented Japan, 2003. on Fig. 1. We computed the source separation criteria de- [2] H. Bousbiah-Salah, A. Belouchrani, and K. Abed-Meraim, “Jacobi- scribed in [8]. The Source to Distortion Ratio (SDR) mea- like algorithm for blind signal separation of convolutive mixtures,” sures the global error made on the estimates of the sources, Electronics Letters, vol. 37, no. 16, pp. 1049–1050, Aug 2001. whereas the Source to Interference Ratio (SIR) only mea- e [3] C. F´ votte and C. Doncarli, “A uniﬁed presentation of blind sures the contribution of other sources in the estimation of source separation methods for convolutive mixtures using block- diagonalization,” in Proc. 4th Symp. on Independent Component one source and the Source to Artifacts Ratio (SAR) only Analysis and Blind Source Separation, 2003. measures the proportion of Artifacts due to the algorithm [4] K. Abed-Meraim, W. Qiu, and Y. Hua, “Blind system identiﬁca- in the estimates. The ﬁgures show high quality source tion,” Proceedings of the IEEE, vol. 85, no. 8, pp. 1310–1322, Aug. separation: 1997. [5] E. Moulines, P. Duhamel, J-F. Cardoso, and S. Mayrargue, “Susb- Speech Guitar space methods for the blind identiﬁcation of multichannel ﬁr ﬁlters,” SDR (dB) 56.6 41.12 IEEE Trans. Signal Processing, vol. 43, no. 2, Feb 1995. SIR (dB) 83.5 41.9 ´ [6] A. Belouchrani, K. Abed-Meraim, J. F. Cardoso, and E. Moulines, SAR (dB) 56.07 49.0 “A blind source separation technique based on second order statis- tics,” IEEE Trans. Signal Processing, vol. 45, no. 2, pp. 434–444, Feb 1997. 6. CONCLUSION [7] A. Belouchrani, K. Abed-Meraim, and Y. Hua, “Jacobi-like algo- rithms for joint block diagonalization: Application to source local- ization,” in Proc. ISPACS, Nov. 1998. The advantage of the global method we described is that e [8] E. Vincent R. Gribonval, L. Benaroya and C. F´ votte, “Proposals for it allows complete separation whereas many BSS meth- performance measurement in source separation,” in 4th Symposium ods dealing with convolutive mixtures only provide par- on Independent Component Analysis and Blind Source Separation tial separation, that is, sources estimated only up to a ﬁl- (ICA’03), Nara, Japan, 2003. ter. But this is at the price of heavy computation loads, which is the main disadvantage of the method. The results