Docstoc

ANN

Document Sample
ANN Powered By Docstoc
					               SPEAKER RECOGNITION
            SYSTEM USING ARTIFICIAL
                   NEURAL NETWORKS



                             Presented by


K.V. Prasad Yadav and J. Hari Kiran
III B.Tech I Sem,
I.T. Dept,
Narayana Engineering College,
Nellore,
Andhra Pradesh.

Mail id: prasad_yadav@yahoo.com,
    hari_chiru2007@yahoo.com

Ph: +91 9247850905(PRASAD)
+91 9848812316(HARI KIRAN)
CONTENTS:


  1. ABSTRACT


  2. INTRODUCTION


  3.   SPEAKER RECOGNITION


  4.   SYSTEM CONCEPT


  5.   FEATURE EXTRACTIONS


  6.   PATTERN MATCHING


  7. SPEAKER RECOGNITION APPLICATION IN E_BANKING


  8. CONCLUSIONS


  9. REFERENCES
ABSTRACT:




     Many people today have access to their company’s information system by logging in
  from home. Also Internet services and telephone banking are widely used by
  corporate and private sectors. Therefore to protect one’s resources or information with
  simple password is not reliable and secure in the world of today. Biometrics are used to
  identify people based on their biological traits. This growing technological field has
  deep implications because proving identity is becoming an integral part of our daily
  lives. Biometrics are methods for recognizing a user based upon speaker unique
  physiological and behavioral characteristics. Voice signal as unique behavioral
  characteristics is presented in this paper for speaker verification over telephone lines using
  artificial neural network (ANN) for banking application.
                        Artificial Neural network (ANNs) are intelligent system that are
  related in some way to a simplified biological model of human brain. They are composed
  of many simple elements, called neural neurons, operating in parallel and connected to
  each other.Here Multi-layer feed forward artificial neural network (ANN) system capable
  of verifying a speaker among the group of speakers is designed. Spectral density of
  recorded voice signal is used for characterization. Finally the feasibility of the
  speaker recognition system is tested. This system found more efficient in speaker
  recognition.
INTRODUCTION
        There is a vital need for speaker identification in all spheres of life. The most
 important being that this system will enable people to have secure access to information
 and property. It has significant advantage that in electronic banking and Internet access.
 Countless money      is lost each year due to white-collar crime, fraud and embezzlement. In today’s
 complex economic times, businesses and individuals are both falling victims to these devastating
 crimes. Employees embezzle funds or steal goods from employers, then disappear or hide behind
 legal issues. Individuals can easily become helpless victims of identity theft, stock schemes and
 other scams that rob them of their money.
      One solution to avoid such white-collar crimes and shorten the lengthy time in
 locating and serving perpetrators with a judgment is by use of biometrics techniques for
 verifying individuals. Artificial neural network (ANN) are intelligence systems that are
 related in some way to a simplified biological model of human brain. Attenuation and
 distortion of voice signals exists over telephone lines and artificial neural network, despite
 a nonlinear, noisy and un -stationary environment, is still good at recognizing and verifying
 unique characteristics of signal such as speech. Speaker recognition involves speaker
 identification or speaker verification based on his\her voice in the form of speech.


  SPEAKER RECOGNITION:
         There is a vital need for speaker identification in all spheres of life. The most
important being that this system will enable people to have secure access to information
and property. It has significant advantage that in electronic banking and Internet access.
Countless money      is lost each year due to white-collar crime, fraud and embezzlement. In today’s
complex economic times, businesses and individuals are both falling victims to these devastating
crimes. Employees embezzle funds or steal goods from employers, then disappear or hide behind legal
issues. Individuals can easily become helpless victims of identity theft, stock schemes and other
scams that rob them of their money. One solution to avoid such white-collar crimes and shorten the
lengthy time in locating and serving perpetrators with a judgment is by use of biometrics
techniques for verifying individuals. Artificial neural network (ANN) are intelligence systems that
are related in some way to a simplified biological model of human brain. Attenuation and distortion
of voice signals exists over telephone lines and artificial neural network, despite a nonlinear, noisy and
un -stationary environment, is still good at recognizing and verifying unique characteristics of signal
     such as speech. Speaker recognition involves speaker identification or speaker verification based on
     his\her voice in the form of speech.
        Speaker recognition is the generic term used for two related problems:
       1. Speaker Identification: the problem is to determine the identity of a
           speaker from a known group of (N) possible speakers.
       2. Speaker Verification: basically the same problem as speaker identification, except
           that claimed identity is also given and the problems are “merely” to confirm or
           disconfirm the identity claim.
          Speaker recognition problem using ANN is divided into two parts
                 i) Feature extraction      ii) Pattern matching.
             The Text dependant audio signals are recorded over telephone lines for different speakers.
In feature extraction signal – processing toolbox of MATLAB is used to convert recorded sound files to a
presentable form as input vector to a neural network. In pattern matching, the output of neural network
identifies and verifies unique characteristics of the features of speech signal. The feature extraction,
the neural network architecture and the software and hardware involved in the development of speaker
identification and verification system are described in this paper. First few sections of this paper are
dedicated to speaker recognition system architecture and later its application in e_banking is discussed.


   SYSTEM CONCEPT:
       The speaker recognition system over telephone lines is investigated in this paper using
   artificial neural network shown in figure 1.




      Figure1: Block Diagram of the Speaker recognition system using an ANN
    In this paper, the speaker recognition system reported is a text-dependant type. The system
is trained on a group of people to be identified by each person speaking out of same phrase.
The voices is recorded on a standard 16-bit computer sound card from telephone handset
receiver. Although the frequency of human voice ranges from 0 KHz to 20 KHz, most of signal
content lies in 0.3 KHz to 4 KHz range. The frequency over the telephone lines is limited to 0.3
KHz to 3.4 KHz and this is the frequency band of interest in this paper. Therefore, a sampling
rate of 16 KHz satisfying the Nyquist criteria is used. The voices are stored as sound files on
the computer. Digital processing techniques are used to convert sound files to a presentable
form as an input vectors to neural network. The output of neural network verifies the speaker in
the group.
FEATURE EXTRACTIONS
         Speaker recognition over telephone network present the many challenges such as :
1. Variations in handset microphones, which result in severe mismatches between data
    gathered from these microphones.
2. Signal distortion due to telephone cannel.
3. Inadequate control over speaker/speaking conditions.
       The bare audio signal cannot be fed into the neural network due to that several speaker
    may produce similar signal. The process of feature extraction consists of obtaining
    characteristics parameter of a signal to be used to classify the signal.
               For speaker recognition, the features extracted from a speech signal should be
    consistent with regard to the desired speaker while exhibiting large deviations from the
    other speaker. Here in feature extraction signal-processing toolbox of MATLAB is used to
    convert recorded sound files to a presentable form as input vector to a neural network.
                Feature like spectral density gives different representation for different
    speaker for same text. Here power spectral density of two different speakers uttering same
    word is shown in figure 2 for speaker X and figure 3 for speaker Y.




               Figure 2: PSD of Speaker X                Figure 3: PSD of Speaker Y
    From the figures 2 and figure 3 it can be seen that the power spectral density (PSDs) of
the speaker X and speaker Y differs from each other.


PATTERN MATCHING
               Artificial Neural network (ANNs) are intelligent system that are related in some way
to a simplified biological model of human brain. They are composed of many simple elements, called
neural neurons, operating in parallel and connected to each other by some multipliers called the
connection weights or strengths. Neural networks are trained by adjusting values of these connection
weights between the neurons.
          Neural networks have a self learning capability, are fault tolerant and noise immune, and have
application in system identification, pattern recognition, classification, speech recognition, image
processing, etc. In this application of speaker recognition, ANN is used for pattern matching. The
performance of feed forward artificial neural network is investigated for this application.
  A three layer feed forward neural network with a sigmoidal hidden layer followed by a liner output
layer is used in this application for pattern matching. Error back propagation algorithm is used for
this purpose. In this application, an adoptive learning rate is used, i.e.the learning rate is adjusted during
training to enhance faster global convergence.




         Figure 4: The Multi layer feed forward (MPL) neural network.
   The MPL network in figure 4 is constructed in MATLAB 6P1 environment. The input to the MPL
 network is vector containing the PSDs. 10 hidden nodes is used. The number of output nodes
 depends on the number of speaker. An initial learning rate, an allowable error and maximum number
 of training cycles/epochs are parameter that is specified during the training phase to a MATLAB
 neural Network.


SPEAKER RECOGNITION APPLICATION IN E_BANKING
  The most straightforward way to employ speaker recognition is in the cases when one has to gain
  access to some secure bank account. Voice is completely compatible with the existing
  transmission protocols via telephone channels; therefore no special adaptations of the system (besides
  the installment of a system) are necessary. For the time being such a service is restricted to operations
  within the accounts maintained by a single individual. One can check the status of their account,
  transfer money between ones own saving accounts, etc.
        Here voice samples of different users are recorded uttering a same phrase over
 controlled and uncontrolled conditions.   Users who want to use his account, utters a same phrase
 over telephone line. The speaker recognition system identifies a particular user is a particular
 account holder and allows him to access the account. If a particular user is not an account
 holder, i.e. his voice didn’t matches with any particular person in a group of uses then system
 disconfirms his identity and not allow him to access the account.
CONCLUSIONS
               Use of artificial neural network in speaker recognition system is proved to be a
fair amount of success. Using features like pitch, autocorrelation and cestrum the success rate
of this system can be increased. This concept of speaker recognition has variety of applications
in the fields such as e –banking.
                  If the 21st Century is to be the age of intelligent machines, then artificial
neural networks will become an integral part of life. In order that software engineers can lead us to this
"promised life" they must begin by utilizing the emerging technology of Neural Networks. To do that
they must optimize their time by using already implemented hardware and commercial software packages
while anticipating what is still to come.
REFERENCES




1.M.AnandaRao and J.Srinivas,”Neural Networks Algorithms and Applications”,Narosa
Publications.




2.S.Rajasekaran and G.A.Vijayalakshmi Pai,”Neural Networks,Fuzzylogic and Genetic
Algorithms Synthesis and Applications.”




3.Venayagamoorty GK, Sundepersadh N , “Comparison of text – dependent speaker
identification methods for short distance telephone lines using artificial neural network “ ,
Proceedings of     IEEE neural network letter 2000, pp 253 to 258.




4. Lawrence R. Rabiner and Ronaid W. Schafer, Digital Processing of Speech
Signals, Prentice- Hall Inc.




5. O. Farooq and S. Datta, Speech Recognition with Emphasis on Wavelet based Feature
Extraction, IETE Journal of Research, Vol. 1, January-February, 2002, pp. 3-13.




6. Dr. Chen-Han Sung & William C. Jones, III, A Speech Recognition System Featuring
Neural Network Processing of Global Lexical Features, IEEE Conference Proceeding, Vol.
11, pp. 437-439.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:24
posted:7/17/2011
language:English
pages:10