NOVEL APPROACH TO TEXT INDEPENDENT SPEAKER IDENTIFICATION

Document Sample
NOVEL APPROACH TO TEXT INDEPENDENT SPEAKER IDENTIFICATION Powered By Docstoc
					                       of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
International JournalINTERNATIONAL JOURNAL OF ELECTRONICS AND
6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME
            COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 3, Issue 2, July- September (2012), pp. 87-93
                                                                                     IJECET
© IAEME: www.iaeme.com/ijecet.html
Journal Impact Factor (2011): 0.8500 (Calculated by GISI)                          ©IAEME
www.jifactor.com




                   NOVEL APPROACH TO TEXT INDEPENDENT SPEAKER
                               IDENTIFICATION

                     Pallavi P. Ingale.                              Dr. S.L. Nalbalwar
                 Department of Electronics and                    Department of Electronics and
                      Telecommunications                               Telecommunications
            Dr. Babasaheb Ambedkar Technological             Dr. Babasaheb Ambedkar Technological
                  University, Lonere, INDIA.                       University, Lonere, INDIA.
                 pallavi.ingale@rediffmail.com                   nalbalwar_sanjayan@yahoo.com


      ABSTRACT
      In this paper, we propose Speaker Identification using two transforms, namely Kekre’s
      Transform and Kekre’s Wavelet Transform. The speech signal is spoken by a particular
      speaker is converted into a spectrogram by first taking 25 or 50% overlapping frames
      between consecutive sample vectors and arranged in the form of matrix. In order to
      improve the performance, intially log of matrix is taken and then one of transform is
      applied on the spectrogram. The resultant transformed matrix forms the feature vector,
      which is used in the training as well as matching phases of identification. The results of
      both the transform techniques have been compared by using feature vectors obtained.
      From the comparision it is observe that, Kekre’s Transform shows much better
      performance for 50% overlap.
      Keywords - Speaker Identification, Kekre’s Transform, Kekre’s Wavelet Transform, Short
      Time Fourior Transform, Spectrogram, Feature extraction
       I.   INTRODUCTION
         For over six decades, scientists have studied the ability of human listeners to recognize
      and discriminate voices. By establishing the factors that convey speaker-dependent
      information, researchers have been able to improve the naturalness of synthetic and
      vocoded speech and assess the reliability of speaker. Soon after the development of digital
      computers, research on speaker recognition turned to developing objective techniques for
      automatic speaker recognition, which quickly led to the discovery that simple automatic
      systems could outperform human listeners on a similar task [1]. Over the last three
      decades, researchers have developed increasingly sophisticated automatic speaker
      recognition algorithms, and the performance of these algorithms in more realistic


                                                            87
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

evaluation speech corpora has improved. Today, task-specific speaker- recognition
systems are being deployed in large telecommunications applications. [2].
    Speaker recognition involves two tasks: identification and verification. In identification,
the goal is to determine which voice in a known group of voices best matches the speaker.
In verification, the goal is to determine if the speaker is who he or she claims to be. In
speaker identification, the unknown voice is assumed to be from the predefined set of
known speakers. For this type of classification problem an N alternative, forced-choice
task-errors are defined as misrecognitions and the difficulty of identification generally
increases as the speaker set increases. Speaker verification requires distinguishing a
speaker's voice known to the system from a potentially large group of voices unknown to
the system [1], [2].
   Speaker-recognition tasks are further distinguished by the constraints placed on the text
of the speech used in the system. In a text-dependent system, the spoken text used to train
and test the system is constrained to be the same word or phrase. In a text-independent
system, training and testing speech is completely unconstrained. This type of system is
required for applications which lack control over what a person says [3].
   In this paper we have proposed a different approach of speaker identification by using
the spectrograms, Kekre’s Transform and Kekre’s Wavelet Transform [5], [6]. Kekre’s
Transform and Kekre’s Wavelet Transform are taken on the spectrogram of the speech
signal taken on a specific frame sizes. The generalized block diagram of the feature
extraction method is shown in Fig. 1. As shown in Fig. 1, the reference signals in the
database are first converted into their spectrograms. Then log is taken and one of the two
transforms is applied on the spectrogram. The feature vectors are extracted and models are
created using Gaussian mixture modeling technique in [3], [4]. The test signal to be
identified is similarly processed and the feature vector is extracted, model is created and is
matched with the previously created and stored model. Speaker whose model has the
maximum posterior probability for the input feature-vector sequence (the test signal) is
declared as the speaker identified. Section 2 describes the process of converting the speech
signal into a spectrogram [5]. The Kekre’s Transform and Kekre’s Wavelet Transform
have been explained in section 3 [6]. In Section 4, the feature vector extraction is
explained. Results are discussed in section 5 and conclusion in section 6.
 II.  GENERATION OF SPECTROGRAM
   The first step in the speaker identification system is to convert the speech signal into a
spectrogram [5]. A spectrogram is a time-varying spectral representation that shows how
the spectral density of a signal varies with time. Spectrograms have been used for speaker
Identification since a very long time. Spectrograms are usually calculated from the time
signal using the short-time Fourier transform (STFT). Creating a spectrogram using the
STFT is usually a digital process. Digitally sampled data, in the time domain, is broken up
into chunks, which usually overlap, and Fourier transformed to calculate the magnitude of
the frequency spectrum for each chunk. Each chunk then corresponds to a vertical line in
the image; a measurement of magnitude versus frequency for a specific moment in time.
The spectrums or time plots are then laid side by side to form the image or a three-
dimensional surface. This is done using the following steps:
   1. The speech signal is first divided into frames, (here frame of size 256) with an
       overlap of 50% or 25%.

                                                  88
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

  2. These frames are arranged column wise to form a matrix. E.g. if the speech signal is
     a one dimensional signal of 44096×1. We divide this into frames of 256 samples
     each with an overlap of 50% between consecutive frames i.e. overlap of 64. These
     frames are then arranged column wise to form a matrix of dimension 128×344.
  3. Discrete Fourier Transform (DFT) is applied to this matrix column wise.
  4. The spectrogram is then taken as the squared magnitude of this transform matrix.


                               SPEECH SIGNAL




                           Speech Signal

                             Framing
                          (With Overlap)
                         & forming Matrix


                             Converting
                           Framed Speech
                             Signal into
                            Spectrogram


                            Take Squared
                             Magnitude


                                  Log


                             Kekre’s
                            Transform/
                          Kekre’s Wavelet
                            Transform

                     Cepstral Coeficients
              Generated Using Kekre’s Transform
                             and
                 Kekre’s Wavelet Transform


       Figure 1. Feature extraction method



III. KEKRE’S TRANSFORM AND KEKRE’S WAVELET TRANSFORM
  Kekre’s Transform matrix (K) can be of any size NxN, which need not have to be in
powers of 2 (as is the case with most of other transforms including Haar Transform). All
upper diagonal and diagonal values of Kekre’s transform matrix are one, while the lower
diagonal part except the values just below diagonal are zero [5], [6].
  Generalized N×N Kekre’s Transform Matrix can be given as in (1).


                                                  89
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME


                  1            1         1…..   1             1
                -N+1           1         1…..   1             1
  KNxN   =        0          -N+2        1…..   1             1         (1)
                  :            :         :      :             :
                  0            0         0…..   1             1
                  0            0         0…..-N+(N-1)         1

  The formula for generating the term Kxy of Kekre’s transform matrix is given by (2).

                    1                    :         x≤y
  Kxy =             -N + (x-1)           :         x=y+1          (2)
                    0                    :         x > y+1

   Also Kekre’s Wavelet transform matrix (KW) has been developed using (N×N) Kekre’s Transform
Matrix. Kekre’s Wavelet transform is derived from Kekre’s transform [5].
   From NxN Kekre’s transform matrix, we can generate Kekre’s Wavelet transform matrices of size
(2N)x(2N), (3N)x(3N),……, (N2)x(N2). For example, from 5x5 Kekre’s transform matrix, we can generate
Kekre’s Wavelet transform matrices of size 10x10, 15x15, 20x20 and 25x25. In general MxM Kekre’s
Wavelet transform matrix can be generated from NxN Kekre’s transform matrix, such that M = N * P where
P is any integer between 2 and N that is, 2 ≤ P ≤ N. Consider the Kekre’s transform matrix of size NxN
shown in Figure – 2.


              K11        K12        K13 …      K1(N-1) K1N
              K21        K22        K23 …      K2(N-1) K2N
              K31        K32        K33 …      K3(N-1) K3N
               :          :          :          :       :
              KN1        KN2        KN3 …      KN(N-1) KNN


                                      Figure 2. Kekre’s Transform Matrix (K) of size NxN

   Kekre’s transform on a column vector f is given by
                                                    F=[K] f                                   (3)

   MxM Kekre’s Wavelet transform matrix generated from NxN Kekre’s transform matrix. First N numbers
of rows of Kekre’s Wavelet transform matrix are generated by repeating every column of Kekre’s transform
matrix P times.



          K(N-P+2) (N-P+1)       K(N-P+2)(N-P+2) … K(N-P+2) N
          K(N-P+3) (N-P+1)       K(N-P+3)(N-P+2)…. K(N-P+3)N
              :                      :                 :
              :                      :                 :
            KN(N-P+1)              KN(N-P+2) ……… KNN



                                             Figure 3. Temporary matrix T of size (P-1) x P

   To generate remaining (M-N) rows, extract last (P-1) rows and last P columns from Kekre’s transform
matrix and store extracted elements in to temporary matrix say T of size (P-1) x P . Figure - 3 shows
extracted elements of Kekre’s transform matrix stored in T.
   Values of matrix T can be computed as:



                                                                   90
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

T(x,y) = K( N-P+(x+1), N-P+ y) ;                              1≤ x≤ (P-1) ,
         1≤ y≤ P      (4)

   First row of T is used to generate (N+1) to 2N rows of Kekre’s Wavelet transform matrix. Second row of
T is used to generate (2N+1) to 3N rows of Kekre’s Wavelet transform matrix. Likewise last row of T is
used to generate ((P-1)N + 1)to PN rows.

Example- To generate Kekre’s Wavelet Transform:
Figure -4 shows Kekre’s Wavelet transform matrix of size 15x15 generated from the Kekre’s transform
matrix 5x5. Here M=15, N=5 and P=M/N=3.



                                                   1           1         1           1     1
                                                   -4          1         1           1     1
                                                   0          -3         1           1     1              (5)
                                                   0           0         -2          1     1
                                                   0           0         0           -1    1


Above matrix is used to generate Kekre’s Wavelet transform matrix of size 15x15.

    As shown in Figure - 4, all the columns of Kekre’s transform matrix are repeated P=3 times to generate
first N=5 number of rows of Kekre’s Wavelet transform matrix. To generate remaining (M-N) = 10 rows,
extract last (P-1) = 2 rows and last P=3 columns from Kekre’s transform matrix and store these elements into
temporary matrix T. Below is temporary matrix T.

           T        =        -2         1          1
                              0         -1         1                          (6)

   The first row of T: [-2 1 1] is used to generate next 5-10 rows of KW transform matrix as shown in the
Figure – 4. Second row of T [0 -1 1] is used to generate last 11-15 rows of KW transform matrix [6]. The
Kekre’s Wavelet Transform on a column vector f is given by
                 F = [KW] f                              (7)

 1    1        1        1     1    1     1    1     1    1     1    1    1      1    1
 -4   -4       -4       1     1    1     1    1     1    1     1    1    1      1    1
 0    0        0        -3    -3   -3    1    1     1    1     1    1    1      1    1
 0    0        0        0     0    0     -2   -2    -2   1     1    1    1      1    1
 0    0        0        0     0    0     0    0     0    -1    -1   -1   -1     -1   -1
 -2   1        1        0     0    0     0    0     0    0     0    0    0      0    0
 0    0        0        -2    1    1     0    0     0    0     0    0    0      0    0
 0    0        0        0     0    0     -2   1     1    0     0    0    0      0    0
 0    0        0        0     0    0     0    0     0    -2    1    1    0      0    0
 0    0        0        0     0    0     0    0     0    0     0    0    -2     1    1
 0    -1       1        0     0    0     0    0     0    0     0    0    0      0    0
 0    0        0        0     -1   1     0    0     0    0     0    0    0      0    0
 0    0        0        0     0    0     0    -1    1    0     0    0    0      0    0
 0    0        0        0     0    0     0    0     0    0     -1   1    0      0    0
 0    0        0        0     0    0     0    0     0    0     0    0    0      -1   1


                        Figure 4. 15x15 Kekre’s Wavelet transform matrix generated from 5x5 Kekre’s transform matrix.



                                                                               91
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

IV.    FEATURE VECTOR EXTRACTION
   The procedure for feature vector extraction is given below. Spectrogram is calculated by given
procedure in section 2.
   Spectrogram gives the three dimensional information i.e. time, frequency and amplitude. In
spectrogram we get a very wide range of amplitude levels (very small values accompanied with
very high values). While processing these wide ranges of values, generally we will not be able to
produce faithful results. Because when these values are taken linearly (as it is.), the very high
amplitude values will affect the output more and lower will affect less. But, instead of this if we
first apply log to the spectrogram values, then the range of values of result will be shortened. It
becomes more manageable. Therefore log is applied on the spectrogram of the speech signal.
   Column Transform (Kekre’s Transform/ Kekre’s Wavelet Transform) is applied on the
spectrogram of the speech signal. This transform matrix forms the features for the speech signal.
   The features for different speakers are calculated and used to create models for every speaker in
the database.

 V.    RESULTS
A. Experimental Results
In identification, the goal is to determine which voice in a known group of voices best matches the
speaker. For this Gaussian Mixture Modeling technique is used. The identification system is a
straightforward maximum likelihood classifier. For a reference group of speaker models the
objective is to find the speaker identity whose model has the maximum posterior probability for
the input feature-vector sequence [3], [4].

B. Accuracy of Identification
The accuracy of the identification [3] system is calculated as given by equation (8).

Percentage of Identification accuracy
                No. of segments correctly Identified
                  Total no. of segments under test              (8)

   Test segment length is taken 5 seconds. Accuracy is checked for 24 speakers. Table I shows the
results obtained by using the two transforms for an overlap of 50% and 25% between the adjacent
frames while creating the spectrograms of the speech signals. Kekre’s Wavelet Transform matrix
of size 8x8 is obtained from Kekre’s transform matrix of size 4x4. Kekre’s Wavelet Transform
matrix of size 32x32 is obtained from Kekre’s transform matrix of size 8x8.
   In Table 1, the accuracy is obtained for a feature vector size of 256 and overlap of 50%

                                  TABLE 1       OVERLAP OF 50%

Transform used
                                            Kekre’s
     Size of            Kekre’s             Wavelet
transform matrix       Transform           Transform
       8x8               94.44               89.40
      32x32              94.96               92.53

  For Kekre’s transform, the average accuracy is slightly more than Kekre’s Wavelet transform.
This accuracy is again greater for transform matrix of size 32x32.
  In Table 1, the accuracy is obtained for a feature vector size of 256 and overlap of 25%


                                                       92
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

                                 TABLE 2      OVERLAP OF 25%

Transform used
                                           Kekre’s
     Size of            Kekre’s            Wavelet
transform matrix       Transform          Transform
       8x8               91.49              88.54
      32x32              93.57              91.49

   By observing Table 2, for 25% overlapping frames, the average accuracy is slightly less than
that for 50% overlapping frames for both the transforms.
   CONCLUSION
   In this paper we have compared the performance of Kekre’s transform and Kekre’s Wavelet
transforms for speaker identification. Kekre’s transform gives an average accuracy of around 94%.
Kekre’s Wavelet transform gives an average accuracy of 91%. These both accuracies are obtained
for a feature vector size of 256 and 50% overlap of consecutive frames of speech signal. For 25%
overlapping frames, the average accuracy is slightly less than that for 50% overlapping frames for
both the transforms. A slight difference in accuracy is observed for the change in size of Kekre’s
Wavelet Transform matrix. Overall Kekre’s transform gives better results as compared to Kekre’s
Wavelet transform for 50% overlap of consecutive frames of speech signal.
  REFERENCES
[1] B.S.Atal, “Automatic recognition of speakers from their voices,” in Proc. IEEE,
    vol.64, no. 4, pp. 460-475, Apr. 1976.
[2] Joseph P. Campbell, Jr., Senior Member, IEEE, “Speaker Recognition: A Tutorial”,
    Proceedings of the IEEE, vol. 85, no. 9, pp. 1437-1462, September 1997.
[3] Douglas A. Reynolds, “Automatic Speaker Recognition Using Gaussian Mixture
    Speake Models”, vol.8, Nov.2, 1995, The Lincoln Laboratory journal, pp. 173-192.
[4] D.A. Reynolds and R.C. Rose, “Robust Text-Independent speaker identification using
    Gaussian mixture speaker models,” IEEE Trans. Speech and Audio Processing, vol. 3,
    no. 1, pp. 72-83, Jan. 1995W.-K. Chen, Linear Networks and Systems. Belmont, Calif.:
    Wadsworth, pp. 123-135, 1993. (Book style)
[5] Dr. H.B.Kekre, Archana Athawale,Deepali Sadavarti, “Algorithm to Generate Kekre’s
    Wavelet Transform from Kekre’s Transform”, International Journal of Engineering
    Science and Technology Vol. 2(5), 2010, 756-767.
[6] Dr. H B Kekre , Vaishali Kulkarni, “Speaker Identification using Row Mean of Haar
    and Kekre’s Transform on Spectrograms of Different Frame Sizes”, (IJACSA)
    International Journal of Advanced Computer Science and Applications, Special Issue
    on Artificial Intelligence




                                                   93

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:11/20/2012
language:
pages:7