; 023
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

023

VIEWS: 33 PAGES: 4

  • pg 1
									  BLIND SOURCE SEPARATION OF FIR CONVOLUTIVE MIXTURES: APPLICATION
                         TO SPEECH SIGNALS

                            e       e
                           C´ dric F´ votte, Alexandra Debiolles, Christian Doncarli

                                                               e
            Institut de Recherche en Communications et Cybern´ tique de Nantes (IRCCyN)
                                              e
            UMR CNRS 6597, 1 rue de la No¨ , BP 92101, 44321 Nantes Cedex 03, France
                                      fevotte@irccyn.ec-nantes.fr


                       ABSTRACT                                  eral whitened covariance matrices. The method assumes
                                                                 that the sources are stationary but still performs well on
In this paper we present a simple method to deal with
                                                                 audio signals as long as they are uncorrelated. Some time-
Blind Source Separation (BSS) of Finite Impulse Response
                                                                 frequency approaches were proposed by several authors to
(FIR) convolutive mixtures. The global method proceeds
                                                                 deal with non-stationary signals but we will not address
in two steps. The first step consists in separating each
                                                                 such a level of generality here (see [3] for an overview of
source contribution in the mixture. This step provides
                                                                 blind separation methods for convolutive mixtures using
several filtered version of each source. The second step
                                                                 block-diagonalization).
consists in retrieving the original sources from the set of
                                                                     The second step has been widely investigated in the lit-
filtered versions of each source using a blind system iden-
                                                                 erature. An overview of blind system identification (BSI)
tification method. We present some results on a mixture
                                                                 methods is available in [4]. We will briefly describe the
of speech and music.
                                                                 method based on subspace decomposition presented in [5].
                                                                     The only novelty in this paper is the combination of
                 1. INTRODUCTION                                 the first step (SOBI convolutive) and the second step (BSI)
                                                                 to perform complete separation instead of partial separa-
Blind Source Separation has many applications in Au-             tion as in [2].
dio Signal Processing (see [1] for an overview of Audio              In Section 2 we introduce assumptions and notations,
Source Separation applications). Usually we have to deal         and we show how the convolutive mixing can be turned
with convolutive mixtures, for example to take into ac-          into an instantaneous mixing. In Sections 3 and 4 we
count the reverberation in a room.                               present briefly the source contributions separation step and
    In this paper we aim at presenting how the separation        the blind identification step. An illustration of the perfor-
of a Finite Impulse Response (FIR) convolutive mixture           mance of the method over a mixture of speech and guitar
can be tackled with the use of a (joint) block-diagonalization   is presented in Section 5.
procedure. The method proceeds in two steps.
    First step consists in separating each source contri-
                                                                                    2. BACKGROUND
bution to the mixture. The convolutive mixing is rear-
ranged into a multiplicative mixing of new sources intro-        2.1. Aim and assumptions
ducing proper variables. But some of these new sources
are dependent. However, the contributions of the different       We consider the following discrete-time noiseless FIR
sources can be separated by using the standard BSS algo-         MIMO model:
rithm SOBI extended for the particular case when some of
the sources are dependent. However, the algorithm do not          x[t] = H[0] s[t] + H[1] s[t − 1] + . . .+ H[L] s[t − L] (1)
provide the original sources but several filtered version of
each source.                                                     where x[t] = [x1 [t], . . . , xm [t]]T is the vector of size m
                                                                                                                             T
    Then, the second step consists in recovering the orig-       containing the observations, s[t] = [s1 [t], . . . , sn [t]] is
inal sources from the set a of filtered versions of each          the vector of size n containing the sources (assumed zero-
source obtained from the first step. In the literature, this      mean and mutually uncorrelated at every time instant),
problem is usually named blind system identification or           H[k] = {hij [k]}, k = 0 . . . L, are m × n matrices with
blind deconvolution.                                             m > n.
    Each step of the global BSS method was published                 The overall objective of BSS is to obtain estimates
separately by several authors. The first step (extension of       of the mixing filters and/or estimates of the sources up
SOBI to convolutive mixtures) is presented in [2]. The al-       to standard BSS indeterminacies on ordering, scale and
gorithm is based on the joint block-diagonalization of sev-      phase.
2.2. Back to instantaneous mixing                                           environment, the reader should refer to [3] and references
                                                                            therein concerning A. Belouchrani and K. Abed-Meraim
We recall from [2] how the convolutive mixing (1) can be
                                                                            work on the topic. The key of the method is to formulate
rearranged into an instantaneous mixing.
                                                                            the overall problem described by Eq. (2) in the time-lag
                                                                            plane.
2.3. Notations                                                                  For (t, τ ) ∈ Z2 we note RSS [t, τ ] the covariance ma-
Let L be an integer such that mL ≥ n(L + L ) (L exists                      trix of S[t] defined by:
when m > n). We note, for i = 1, . . . , n:
                                                                                                     def                      H
                                                                  T
                                                                                         RSS [t, τ ] = E{S[t] S[t + τ ] }
         Si [t] = [si [t], . . . , si [t − (L + L ) + 1]]
                                                                            Since the sources are assumed stationary we have:
and for j = 1, . . . , m

           Xj [t] = [xj [t], . . . , xj [t − L + 1]]
                                                              T                                 RSS [t, τ ] = RSS [τ ]                      (3)

where ·T denotes “transpose”. Then we introduce:                            The vector signals S1 [t], . . . , Sn [t] being mutually uncor-
                                                          T                 related, the N × N covariance matrix RSS [τ ] is block-
                                  T                   T
            S[t] =          S1 [t] , . . . , Sn [t]                         diagonal with n blocks of dimensions (L + L ), such that:
                                                              T
                                   T                      T
                                                                                                                                       
            X[t]    =       X1 [t] , . . . , Xm [t]                                                RS1 S1 [τ ]
                                                                                   RSS [τ ] = 
                                                                                                                ..                     
                                                                                                                      .                 
                                                                                                                          RSn Sn [τ ]
∀t, S[t] is a column vector of size n(L + L ) and X[t]
is a column vector of size mL . For simplicity we note                      With Eq. (2) we have:
N = n(L + L ) and M = mL .
    For i = 1, . . . , n and j = 1, . . . , m we note Aij the                                RXX [τ ] = A RSS [τ ] AH                       (4)
following L × (L + L ) Sylvester matrix:
                                                          
            hij [0] . . . hij [L]       0      ...    0                     3.1. Generalization of SOBI
                      ..              ..                  
                         .               .                
                                                                            A two-steps separation method (whitening and rotation)
 Aij =                                                   
                              ..               ..          
                                                                            can be devised from (4) [2, 3].
                                .                .        
              0        ...     0    hij [0] . . . hij [L]
                                                                            3.1.1. Whitening
Finally, we note:
                   A11
                      
                                       ...   A1n
                                                                           In the FIR convolutive case, whitening consists in finding
                   .                         .                            a matrix W of dimensions N × M such that:
                A= .
                    .                         . 
                                              .
                   A1m                 . . . Amn                                              W A B AH W H = I M                            (5)
A is a M × N matrix which satisfies:                                         where B is a N ×N bloc-diagonal matrix positive definite
                           X[t] = A S[t]                              (2)   with n blocks of dimension (L + L ). In practice, using
                                                                            B = RSS [0], W can be computed from the eigenele-
In the following we assume that A is full rank.                             ments of the following estimation of RXX [0] [3]:
    Eq. (2) shows that the convolutive mixing (1) can be
written as an instantaneous mixture. Such mixtures (2)
have been widely studied in BSS/ICA literature. However                                 ˆ            def   1
                                                                                        RXX [0]      =          X[t] X[t]H                  (6)
the big difference here is that the components of S[t] are                                                 T
not all mutually independent: when the sources are not                                               ≈       ˆ
                                                                                                           A RSS [0] AH                     (7)
white, for i = 1, . . . , n, the components of Si [t] are de-
pendent.                                                                    3.1.2. Rotation

        3. STEP 1: SEPARATION OF THE                                        The second step of the method is the estimation of U =
                                                                                     1
       CONTRIBUTIONS OF EACH SOURCE                                         W A B 2 . It is shown in Section 3.1.3 that some estimates
                                                                            of the sources can be retrieved from W and U.
In this section we briefly describe the source contributions                     Let us define the following “whitened” covariance ma-
separation step. We will assume that the sources are sta-                   trices:
tionary. For a more general study, in particular in a noisy                                RXX [τ ] = W RXX [τ ] WH                (8)
With Eq. (4) we have:                                                            4. STEP 2: BLIND IDENTIFICATION

        RXX [τ ]    = W A RSS [τ ] AH WH                                 We now shortly describe the subspace method presented
                    = U (B        −1
                                   2   RSS [τ ] B   −H
                                                     2   )U   H
                                                                   (9)   in [5] to handle the deconvolution of the several filtered
                                                                         versions of each source we obtained from the first step.
               1      H
    Since B− 2 , B− 2 and RSS [τ ] are block-diagonal ma-                Let us consider the deconvolution problem of a single source
trices we see that U block-diagonalizes RXX [τ ] for all τ .             d[t] = si [t]. The deconvolution problem expressed by
                                                                         Eq. (14) matches the following structure:
    Thus, U can be retrieved in theory from the block-
                                                                                y1 [t] = f1 [0] d[t] + . . . + f1 [Q] d[t − Q]
diagonalization of any matrix RXX [τ ]. In practice an esti-
mate of U should rather be computed from the joint block-                              .
                                                                                       .
                                                                                       .
diagonalization (JBD) of a set of K matrices {RXX [τi ],
i = 1...K}. JBD provides a more robust estimate of U                            yP [t] = fP [0] d[t] + . . . + fP [Q] d[t − Q]
with respect to estimation errors on RXX [τ ] and reduces                where P = L + L , Q = L + L − 1, yk [t] is the k th entry
indeterminacies in the same way joint-diagonalization does                  ˆ
                                                                         of Si [t] and [fk [0], . . . , fk [Q]] is the k th row of Fi . Our
[6]. JBD provides a matrix UJBD such that:
                                                                         goal is to estimate the filters parameters, that we stack in:
                        UJBD = U P                                (10)                                                                     T
                                                                            f = [f1 [0], . . . , f1 [Q], . . . , fP [0], . . . , fP [Q]]       (15)
where P is a N × N unitary matrix that models JBD inde-                  Provided the filters parameters we will be able to recover
terminacies. P is the product of a block-diagonal unitary                the sources with inverse filtering.
matrix with n blocks of dimension (L + L ) × (L + L )
with a permutation matrix of these blocks. A Jacobi-like                    Let W be an integer ”window parameter”. We define,
JBD algorithm is presented in [7].                                       ∀i = 1, . . . , P :
                                                                                                                                   T
3.1.3. Retrieving the sources                                                         yi [t] = [yi [t], . . . , yi [t − W + 1]]
In this section we compute estimates of the sources (up                  and:
to unknown filters) from UJBD and W. We define the                                                                             T
                        ˆ
following column vector S[t] of dimension N :                                     y[t]    =      y1 [t]T , . . . , yP [t]T
                                                                                                                                       T
                   ˆ                                                              d[t] = [d[t], . . . , d[t − W − Q + 1]]
                   S[t] = UH W X[t]
                           JBD                                    (11)
                                                                         We define ∀i = 1, . . . , P :
Eq.’s (10) and (2) yield:
                                                                                                                                              
                          ˆ                                                          fi [0] . . . fi [Q]     0    ...      0
                          S[t] = C S[t]                           (12)                       ..            ..                                 
                                                                             (i)                .             .                               
with:                                                                      FW =                                                               
                                                                                                     ..          ..                           
                            def           1                                                              .            .
                       C = P H B− 2
                                                                                                                                              
                                                                  (13)
                                                                                       0     ...       0   fi [0] . . . fi [Q]
C is a N ×N block-diagonal matrix with n blocks C1 , . . . , Cn
of dimensions (L + L ) × (L + L ). We decompose S[t] ˆ          (size W × (Q + W )) and:
into n sub-vectors of dimension (L + L ) such that                                                               
                                                                                                             (1)
                                                                                                            FW
                                                    T
               ˆ      ˆ                 ˆ                                                                = . 
                                                                                                                 
               S[t] = S1 [t]T , . . . , Sn [t]T                                                    FW      . 
                                                                                                              .                                (16)
                                                                                                             (P )
                                                                                                            FW
Then, for i = 1, . . . , n, we have:
                       ˆ                                                 (size P W × (Q + W )). With these notations we have:
                       Si [t] = Ci Si [t]                         (14)
                                                                   T                               y[t] = FW d[t]                              (17)
We recall that Si [t] = [si [t], . . . , si [t − (L + L ) + 1]] .
Hence, Eq. (14) means that each component of Si [t] is aˆ                    The parameter W must chosen such as P W ≥ (Q +
FIR filtered version of the ith source si [t]. The coeffi-                 W ), which means that the system in Eq. (17) becomes
cients of the filters are contained in corresponding rows                 overdetermined. The key theorem of [5] is that, if W ≥ Q
of Ci . Then, for each source si [t], we retrieve (L + L )               and if FW −1 is full column rank, then the range of the
filtered versions of si [t]. Thus, a further blind SIMO sys-              columns of FW uniquely determines f . This means that if
tem identification step is required to estimate the original              we can determine the range of FW , we only have to com-
sources instead of filtered versions of them.                             pute a basis of it in the Sylvester matrix form of FW to
recover f (up to scalar factor).                                                   5
                                                                                                                            2




                                                               Speech




                                                                                                        Guitar
                                                                                   0                                        0
    From Eq. (17), we have:
                                                                                                                           −2
                                                                                  −5
                Ryy [0] = FW Rdd [0] FH
                                      W                (18)                            2   4    6                               2   4    6
                                                                                                    4                                        4
                                                                                               x 10                                     x 10
                                                                                   5                                        5                             5
If FW and Rdd [0] are full column rank matrices, Ryy [0]
is a matrix of rank W + Q and the range of FW is sim-




                                                               Mix 1




                                                                                                        Mix 2




                                                                                                                                                 Mix 3
                                                                                   0                                        0                             0
ply the space orthogonal to the null subspace (or noise
subspace in the presence of noise) of Ryy [0]. The null                           −5                                       −5                            −5
                                                                                       2   4    6                               2   4    6                    2   4    6
subspace is the range of the P W − Q − W eigenvectors                                          x 10
                                                                                                    4
                                                                                                                                        x 10
                                                                                                                                             4
                                                                                                                                                                      x 10
                                                                                                                                                                           4


of Ryy [0] associated to the eigenvalue 0. If EW denotes                                                                    5




                                                                                                        Estimated Speech
                                                               Estimated Guitar
                                                                                   2
the P W × P W − Q − W matrix containing these eigen-
                                                                                   0                                        0
vectors, f can be simply estimated as the minimizer of:
                                                                                  −2
                                                                                                                           −5
                   q(f ) = EH FW
                            W         F                (19)                            2   4    6                               2   4    6
                                                                                                    4                                        4
                                                                                               x 10                                     x 10

With proper variables, q(f ) can expressed as a quadratic
form in f and its minimization under the constraint f F =
1 thus amounts to the computation of an eigenvector. See       Fig. 1. Evaluation of extended SOBI + BSI on a mixture
[5] for full details.                                          of speech and guitar


             5. SIMULATIONS RESULTS                            we obtained on short length filters happened to be very
                                                               good up to L = 6. The whole method strongly relies on
We present some results on a noiseless mixture of two          the joint block-diagonalization procedure which happened
sources (one is speech, the other is electric guitar). The     to fail with longer filters.
matrix of mixing filters is arbitrarily chosen as:
        1 + 0.8z −1 + 0.5z −2      0.8 + 0.7z −1 + 0.4z −2                                              7. REFERENCES
H[z] = 0.9 + 0.4z −1 + 0.6z −2      1 + 0.9z −1 + 0.3z −2
                                                                                   e
                                                               [1] E. Vincent, C. F´ votte, R. Gribonval, and al, “A tentative typology
       0.7 + 0.6z −1 + 0.5z −2     0.8 + 0.3z −1 + 0.6z −2         of audio source separation tasks,” in 4th Symposium on Independent
                                                                   Component Analysis and Blind Source Separation (ICA’03), Nara,
The sources, mixtures and estimated sources are presented          Japan, 2003.
on Fig. 1. We computed the source separation criteria de-      [2] H. Bousbiah-Salah, A. Belouchrani, and K. Abed-Meraim, “Jacobi-
scribed in [8]. The Source to Distortion Ratio (SDR) mea-          like algorithm for blind signal separation of convolutive mixtures,”
sures the global error made on the estimates of the sources,       Electronics Letters, vol. 37, no. 16, pp. 1049–1050, Aug 2001.
whereas the Source to Interference Ratio (SIR) only mea-                e
                                                               [3] C. F´ votte and C. Doncarli, “A unified presentation of blind
sures the contribution of other sources in the estimation of       source separation methods for convolutive mixtures using block-
                                                                   diagonalization,” in Proc. 4th Symp. on Independent Component
one source and the Source to Artifacts Ratio (SAR) only            Analysis and Blind Source Separation, 2003.
measures the proportion of Artifacts due to the algorithm
                                                               [4] K. Abed-Meraim, W. Qiu, and Y. Hua, “Blind system identifica-
in the estimates. The figures show high quality source              tion,” Proceedings of the IEEE, vol. 85, no. 8, pp. 1310–1322, Aug.
separation:                                                        1997.
                                                               [5] E. Moulines, P. Duhamel, J-F. Cardoso, and S. Mayrargue, “Susb-
                             Speech    Guitar                      space methods for the blind identification of multichannel fir filters,”
              SDR (dB)         56.6    41.12                       IEEE Trans. Signal Processing, vol. 43, no. 2, Feb 1995.
              SIR (dB)         83.5     41.9                                                                              ´
                                                               [6] A. Belouchrani, K. Abed-Meraim, J. F. Cardoso, and E. Moulines,
              SAR (dB)        56.07     49.0                       “A blind source separation technique based on second order statis-
                                                                   tics,” IEEE Trans. Signal Processing, vol. 45, no. 2, pp. 434–444,
                                                                   Feb 1997.

                   6. CONCLUSION                               [7] A. Belouchrani, K. Abed-Meraim, and Y. Hua, “Jacobi-like algo-
                                                                   rithms for joint block diagonalization: Application to source local-
                                                                   ization,” in Proc. ISPACS, Nov. 1998.
The advantage of the global method we described is that
                                                                                                                e
                                                               [8] E. Vincent R. Gribonval, L. Benaroya and C. F´ votte, “Proposals for
it allows complete separation whereas many BSS meth-               performance measurement in source separation,” in 4th Symposium
ods dealing with convolutive mixtures only provide par-            on Independent Component Analysis and Blind Source Separation
tial separation, that is, sources estimated only up to a fil-       (ICA’03), Nara, Japan, 2003.
ter.
     But this is at the price of heavy computation loads,
which is the main disadvantage of the method. The results

								
To top