Docstoc

Generic Model for Text Dependent Automatic Gujarati Speaker Recognition

Document Sample
Generic Model for Text Dependent Automatic Gujarati Speaker Recognition Powered By Docstoc
					       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856




    Generic Model for Text Dependent Automatic
           Gujarati Speaker Recognition
                                    Himanshu N. Patel1, Paresh V. Virparia2
                          1
                          Anand Institute of Information Science, Anand, Gujarat, India – 388001,
                  2
                   Department of Computer Science, Sardar Patel University, Vallabh Vidyanagar, Gujarat


                                                                [1, 3, 4, 8, 10, 11, 13] had discussed general overviews of
Abstract: In this paper the model of automatic Guajarati        speaker recognition.
speaker recognition systems is presented. Automatic Speaker
recognition is the process of recognizing a person from a       The difference between speaker identification and speaker
spoken phrase by computer. These systems are generally
                                                                verification is given in Table-1. The user speaks the
operates in two modes: to identify a particular person or to
verify a person’s claimed identity. The basic components of     phrase into a microphone. This speech signal is analyzed
the automatic Guajarati speaker recognition systems and         by a system and makes decision to accept or reject the
design tradeoffs are discussed.                                 user’s identity claim or reports insufficient confidence
                                                                and request additional input before making the decision.
Keywords: Access control, authentication, computer
security, identification of persons, speaker recognition,         Table 1: Difference Speaker Verification and Speaker
speech processing, verification.                                                     Identification

1. INTRODUCTION                                                     Speaker Verification            Speaker Identification
Speech processing is a diverse field with many                      It is defined as                It is defined as if a
applications. Figure-1 shows areas of speech processing             deciding if a speaker is        speaker is a specific
and how speaker recognition relates to the other fields             who he claims to be             person or is among a
[2]. This paper will emphasize the speaker recognition                                              group of persons
applications shown in the boxes of Figure 1.                        A person makes an               A person does not make
                                                                    identity claim (e.g.,           identity claim
                                                                    entering an employee
                                                                    number or presenting
                                                                    his smart card)

                                                                The model for Gujarati Automatic speaker verification
                                                                system is shown in Figure-2. The user, who previously
                                                                registered in the system presents smart card containing
                                                                his identification information and user then speaks a
                                                                phrase into the microphone. Before verification session,
                                                                users must register in the system. During registration,
                                                                voice models are generated and stored on a smart card for
                                                                use in later verification sessions.
               Figure-1 Speech Processing
                                                                Table 2 lists some of the human and environmental
Speaker recognition encompasses verification and                factors that contribute to verification and identification
identification. Speaker verification is process of verify a     errors. These factors are outside the scope of algorithms
person’s identity from his voice by computer. In literature     and are better corrected by means other than algorithms
different terms are used for speaker verification,              (e.g., better microphones). These factors are important
including speaker or voice authentication, voice                because, no matter how good a speaker recognition
verification, or voice authentication. In speaker               algorithm is, human error ultimately limits its
identification the system decides who the person is or that     performance.
the person is unknown. Atal, Doddington, Furui,
O’Shaughnessy, Rosenberg, Soong, Sutherland, and Jack

Volume 1, Issue 3, September – October 2012                                                                          Page 94
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856


           Table 2: Sources of verification error              4. MODEL FOR                  GUJARATI         SPEAKER
                                                               VERIFICATION
Misspoken or misread prompted phrases
Extreme emotional states (e.g., stress or duress)              The Gujarati Speaker Verification consists of five steps:
Time varying (intra- or intersession) microphone               1. Digital speech data acquisition
placement                                                      2. Feature extraction,
Poor or inconsistent room acoustics (e.g.,                     3. Pattern matching,
multipath and noise)                                           4. making decision, and
Channel mismatch (e.g., using different                        5. Enrollment to generate speaker reference models.
microphones for enrollment and verification)
Sickness (e.g., head colds can alter the vocal tract)          A block diagram of this procedure is shown in Figure 3 as
Aging (the vocal tract can drift away from models              suggested in [2].
with age)



                     Smart Card


    Gujarati        Authenticatio       Access to
    Speech           n System           resource               Figure-3: Model for Gujarati speaker verification system.

                                                               Feature extraction maps each interval of speech to a
                      Noise
                                                               multidimensional feature space. This feature vectors xi is
                                                               then compared to speaker models by pattern matching.
   Figure-2: Automatic Gujarati Speaker Verification           This results in a match score zi. The match score
                       System                                  measures the similarity of the computed input feature
                                                               vectors to models of the speaker. Finally, a decision is
                                                               made to either accept or reject the claimant according to
2. MOTIVATION                                                  the match score.
Speaker verification and Speaker identification are the
                                                                 4.1. Speech Signal Acquisition
most natural and economical methods for authorize the
use of computer and communications systems and
                                                               Initially, the acoustic sound pressure wave is transformed
multilevel access control. The cost of a speaker
                                                               into a digital signal suitable for voice processing. A
recognition system might only be for the software for the
                                                               microphone is used to convert the acoustic wave into an
recognition algorithm.
                                                               analog signal. This analog signal is conditioned with
                                                               anti-aliasing filtering. The conditioned analog signal is
Biometric systems automatically recognize a person using
                                                               then sampled to form a digital signal by an analog-to
distinguishing characteristics. In Speaker recognition you
                                                               digital (A/D) converter.
perform a task to be recognized. Speaker-recognition
systems can be made somewhat robust against noise and
                                                                 4.2. Feature selection and measures
channel variations [7, 9], ordinary human changes, and
mimicry by humans and tape recorders [6].
                                                               The speech signal can be represented by a sequence of
                                                               feature vectors. The selection of appropriate features and
                                                               methods to estimate them are known as feature selection
3. PROBLEM FORMULATION                                         and feature extraction, respectively.

Speech is a complicated signal created as a result of          In speaker verification, the goal is to design a system that
several transformations occurring at several different         minimizes the probability of verification errors. Thus, the
levels such as semantic, linguistic, articulatory, and         underlying objective is to distinguish between the given
acoustic and appeared as differences in the acoustic           speaker and all others. For an overview of the feature
properties of the speech signal. In speaker recognition, all   selection and extraction methods, please refer to [2].
these differences can be used to discriminate between
speakers.                                                        4.3. Pattern matching


Volume 1, Issue 3, September – October 2012                                                                      Page 95
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856


The pattern-matching involves computing a match score          REFERENCES
that represents similarity between the input feature
                                                                [1] B. S. Atal, “Automatic Recognition of Speakers
vectors and some model. Speaker models are built from
                                                                  from Their Voices,” Proceedings of the IEEE Vol.
the features extracted from the speech signal. To register
                                                                  64, pp. 460-475, 1976.
users, a model of the voice extracted features is generated
                                                                [2] J. P. Campbell, “Speaker Recognition: A Tutorial.”
and stored on an encrypted smart card. To authenticate a
                                                                  Proceedings of the IEEE, Vol. 85, No. 9, pp. 1437-
user, the matching algorithm compares/scores the
                                                                  1462, 1997.
incoming speech signal with the model of the claimed
                                                                [3] G. R. Doddington, “Speaker Recognition—
user.
                                                                  Identifying People by their Voices,” Proceedings of
                                                                  the IEEE, Vol. 73, No. 11, pp. 1651-1664, 1985.
There are two types of models:
                                                                [4] S. Furui, “Speaker-Dependent-Feature Extraction,
                                                                  Recognition and Processing Techniques.” Speech
(a) Stochastic models: The pattern matching is
                                                                  Communication,Vol. 10, pp. 505-520, 1991.
    probabilistic and results in a measure of the
                                                                [5] R. Gnanadesikan and J. R. Kettenring.
    likelihood of the observation given the model.
                                                                  “Discriminant Analysis and Clustering,” Statistical
                                                                  Science, Vol. 4, No. 1, pp. 34-69, 1989.
(b) Template Model:         The    pattern   matching     is
                                                                [6] A. Higgins, L. Bahler, and J. Porter, “Speaker
    deterministic.
                                                                  Verification Using Randomized Phrase Prompting,”
                                                                  Digital Signal Processing, Vol. 1, No. 2 , pp. 89-106,
  4.4. Classification and Decision                                1991.
                                                                [7] R. Mammone, X. Zhang, and R. Ramachandran,
A verification decision is made whether to accept or reject       “Robust Speaker Recognition-A Featurebased
the speaker or request another utterance. If a verification       Approach,” IEEE Signal Processing Magazine, Vol.
system accepts an impostor then it is a false acceptance          13, No. 5, pp. 58-71, 1996.
error. If the system rejects a valid user then it is a false    [8] D. O’Shaughnessy, Speech Communication,
rejection error.                                                  Human and Machine. Digital Signal Processing,
                                                                  Reading: Addison-Wesley, 1987.
The result of decision process can be accept user,
                                                                [9] D. Reynolds and R. Rose, “Robust Text-
continue session, session time-out, or reject user. The
                                                                  Independent Speaker Identification Using Gaussian
decision making procedure is a sequential hypothesis-
                                                                  Mixture Speaker Models,” IEEE Transactions on
testing problem [14]. For a brief overview of the decision
                                                                  Speech and Audio Processing, Vol. 3, No. 1, pp. 72-
theory involved, please refer to [2].
                                                                  83, 1995.
                                                                [10] A. Rosenberg, “Automatic Speaker Verification: A
5. IMPLEMENTATION                                                 Review,” Proceedings of the IEEE, Vol. 64, No. 4,
                                                                  pp. 475-487, 1976.
The proposed model will be implemented using Modular            [11] E. Rosenberg and F. K. Soong, “Recent Research
Audio Recognition Framework (MARF) [15]. It contains              in Automatic Speaker Recognition,” In Advances in
a collection of algorithms for Sound, Speech, and Natural         Speech Signal Processing, ed. S. Furui and M. M.
Language Processing arranged into a uniform framework             Sondhi. pp. 701-738. New York: Marcel Dekker,
to facilitate addition of new algorithms for preprocessing,       1992.
feature     extraction,    classification,  parsing,    etc.    [12] F. K. Soong, A. E. Rosenberg, L. R. Rabiner, and
implemented in Java. MARF is also a research platform             B. H. Juang, “A Vector Quantization Approach to
for various performance metrics of the implemented                Speaker Recognition.” AT&T Technical Journal,
algorithms.                                                       Vol. 66, No. 2 , pp. 14-26, 1987.
                                                                [13] A. Sutherland and M. Jack, “Speaker
6. CONCLUSIONS                                                    Verification.” In Aspects of Speech Technology,
                                                                  editors M. Jack and J. Laver, Edinburgh: Edinburgh
Speaker recognition is the use of a computer to recognize         University Press, pp. 185-215, 1988.
a person from a spoken phrase. It can be used to identify a     [14] A. Wald, Sequential Analysis. New York: Wiley,
particular person or to verify a person’s claimed identity.       1947.
In this paper speech processing and general model for           [15] http://marf.sourceforge.net/
Gujarati speaker recognition were discussed. It also
discussed sources of verification errors.


Volume 1, Issue 3, September – October 2012                                                                   Page 96
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856


Authors:
                       Dr. Himanshu N. Patel received
                       the B.E (IC) from Bhavnagar
                       University, M.C.A degrees from
                       IGNOU and PhD in Computer
                       Science from Sardar Patel
                       University in 1998, 2004 and
                       2012, respectively. He has
qualified SET and NET Examination conduct by UGC in
2006 and 2008 respectively. He has 8 years of teaching
experience and currently working as Assistant Professor
at Anand Institute of Information Science, Anand. His
publication includes 2 papers in international journal, 2
papers in national journals and 10 papers in national
level conferences, His research interest includes the areas
of Speech recognition, Open source technology and object
oriented technology.
                          Dr. Paresh V Virparia is
                          working as Professor in the
                          Department of Computer
                          Science of Sardar Patel
                          University,             Vallabh
                          Vidyanagar. He has completed
                          his MCA in 1989 and Ph.D. in
                          2002 from Sardar Patel
                          University. He is recognized
Ph.D. guide in Computer Science at various universities.
Two students have completed their Ph.D. under his
guidance. Currently, SEVEN students are doing Ph.D.
under his guidance. Two students have completed their
M.Phil. (Comp. Sc.) under his supervision. His
publication includes 12 papers in International Journal, 8
papers in National Journals, and 30 papers in national
conferences/seminars. His research interest includes the
areas of Computer Simulation & Modeling, Networking,
and IT enabled services.




Volume 1, Issue 3, September – October 2012                                           Page 97

				
DOCUMENT INFO
Description: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 1, Issue 3, September – October 2012 ISSN 2278-6856