Vowel Recognition by malj

VIEWS: 1,826 PAGES: 8

									4/5/10                             Vowel Recognition

                              Vowel Recognition
Ashwin Philar
Asmita Akerkar


Voice detection is a fascinating field spanning several areas of computer science and
mathematics. Reliable speech recognition is a hard problem, requiring a combination of
many techniques. Our project attempts to carry out Vowel Recognition through Formant
Analysis, wherein we detect which of the five vowels is spoken by the User. The project
is implemented in Matlab, and was successful in achieving accurate results for 9 cases
out of 10.

Background Information:

Speech consists of acoustic pressure waves created by the voluntary movements of
anatomical structures in the human speech production system. These waveforms are
broadly classified into voiced and unvoiced speech. Voiced sounds, (vowels, for
example), produce quasi-periodic pulses of air which are acoustically filtered as they
propagate through the vocal tract.
The main distinction between vowels and consonants is that vowels resonate in the throat.
Formants are exactly the resonant frequencies of your vocal tract when you are
pronouncing a vowel. Most adult males have a fairly similar frequency range for
formants for given phonetic vowels.

Project Research:

This activity involved weighing the various approaches that could have been followed to
achieve vowel recognition. We have used different procedures such as blocking,
normalizing, obtaining the power spectral density model of the block with maximum
power. In our project, we extract the portion of the signal corresponding to the vowel, and
calculate its first three formants. Different vowel sounds are distinguished by unique sets
of resonances or formant frequencies. The program determines which phonetic vowel
was pronounced based on a set of formant profiles. Standard formant values in a 3-d plot
with axes of first formant frequency, second formant frequency, and third formant
frequency. We take the calculated formant values, and match it to the vowel it is closest

4/5/10                              Vowel Recognition


        Read a sound file and store it into a vector.
        Divide the signal in time domain into 10 equal blocks.
        Select the block with maximum power content.
        Normalize the selected block.
        Use the Yule-Walker method to determine the Power Spectral Density (PSD) of
         the signal.
        Determine the frequencies at which the peaks in the PSD occur.
        The first three formants are extracted.
        Calculate the Euclidean distance between the set of frequencies obtained from the
         User, and each of the set of frequencies corresponding to the five vowels.
        The minimum distance criterion is used for decision making to determine the

Matlab Code:
Main function: voweldetector

%                                         Vowel Recognition

function []= voweldetector()

%Functions Performed:
%Detects the vowel sound stored in .Wav files.


s = input('Enter Filename: ','s');

x = wavread(s);

n = length(x);

fs = 8000;                                              %Sampling       frequency     for
voice signal

[slotselected]= block(n,x,fs);                          %Selecting        the       block
containing the strongest signal.

[nsig]= normalize(slotselected);                        %Normalize       the    selected

[Pxx,freq]= PSD(nsig,fs);                               %Returns    a   Power   Spectral
Density Spectrum.

4/5/10                        Vowel Recognition

lenpvec = length(Pxx);

[fmnts]=peak(abs(Pxx),lenpvec,freq,nsig,fs);      %Detecting   the   formant

[vowel]= vowelguess(fmnts);

function : block

function [slotselected]= block(n,x,fs)
%Functions Performed:
%Splits the block into 10 equal parts
%Selects the block with maximum power content

m = floor(n/10); %Rounding off the value to an integer

pmax = 0;

%Split the signal into blocks

y1 = x(1:m);
y2 = x(m+1:2*m);
y3 = x(2*m+1:3*m);
y4 = x(3*m+1:4*m);
y5 = x(4*m+1:5*m);
y6 = x(5*m+1:6*m);
y7 = x(6*m+1:7*m);
y8 = x(7*m+1:8*m);
y9 = x(8*m+1:9*m);
y10 = x(9*m+1:10*m);

y = [y1 y2 y3 y4 y5 y6 y7 y8 y9 y10];

%Calcuate the power

for i=1:10;
   p(i) = 1/m * sum(z(i));

  if p(i)>= pmax
        pmax = p(i);
        j = i;

%select the slot

switch (j)

case 1
   slotselected = y1;
case 2

4/5/10                         Vowel Recognition

    slotselected      = y2;
case 3
    slotselected      = y3;
case 4
    slotselected      = y4;
case 5
    slotselected      = y5;
case 6
    slotselected      = y6;
case 7
    slotselected      = y7;
case 8
    slotselected      = y8;
case 9
    slotselected      = y9;
case 10
    slotselected      = y10;

function: normalize

function [nsig]= normalize(slotselected)
% Functions Performed:
% Normalizes the selected slot

nsig = (slotselected - mean(slotselected)) / max(abs((slotselected -

title('Normalized Signal');

function: PSD

function [Pxx,freq ]= PSD(nsig,fs)
%Functions Performed
%Uses the Yule-Walker method to calculate the Power Spectrum of a sinal
in time domain
%Uses a 1024 point FFT

order = round(fs/1000) + 2;

[Pxx, freq] = pyulear(nsig, order, 1024, fs); % getting      the   Power
Spectral Density and the corresponding frequency values
semilogy(freq, abs(Pxx));
title('Resulting Signal');
ylabel('3db Power');

4/5/10                         Vowel Recognition

function: peak

function [fmnts] = peak(x,lenpvec,freq,nsig,fs)
%Functions Performed
%Detects the peaks in the power spectrum obtained.
%Locates the frequencies corresponding to these peaks.

b = zeros(lenpvec);

for i = 2:lenpvec-1

           if ( (x(i) >= x(i-1)) & (x(i) >= x(i+1)) & ( x(i) ~= x(i+1) ) )
                    b(i) = 1;
           end % if

end % i

loc = find(b==1);
mag = x(loc);

allfmnts = freq(loc);

if ( length(allfmnts) < 3 )
      th = ar(nsig,10);                 % auto-regressive model of voice

         [b,a] = th2tf(th);            % transfer function of vocal tract

         r1 = roots(a);
         r2 = r1(find(angle(r1)>0));

         angles = angle(r2);
         retfmnts = (fs/2)*(angles/pi);
         fmnts = retfmnts(1:3);

          fmnts = allfmnts(1:3);

function: vowelguess

function [vowel] = vowelguess(Formant)

%Functions Performed
% Guesses the vowel using the formant frequencies, f1, f2, and f3 in
the vector, Formant.

w = [2 1 1];% Weighted matrix

A = [822 1894 2724];% "A"

4/5/10                       Vowel Recognition

E = [1096 2070 2816];% "E"

I = [976 2355 3992];% "I"

O = [891 2281 3023];% "O"

U = [7 953 2488];% "U"

% calculates euclidean distance
distA = norm(w.*(A-Formant'));
distE = norm(w.*(E-Formant'));
distI = norm(w.*(I-Formant'));
distO = norm(w.*(O-Formant'));
distU = norm(w.*(U-Formant'));

distances = [distA distE distI distO distU];% distance vector

min_dist = min(distances);% min of distance vector

% decides which vowel and outputs to vowel
if min_dist == distA
    vowel = ['A']
elseif min_dist == distE
    vowel = ['E']
elseif min_dist == distI
    vowel = ['I']
elseif min_dist == distO
    vowel = ['O']
elseif min_dist == distU
    vowel = ['U']

4/5/10                           Vowel Recognition


Normalized Signal Plot for Vowel E:

Power Spectral Density for Vowel E:

4/5/10                           Vowel Recognition


    Frequency of speech differs from person to person.
    It also may differ for the same person depending on anatomical structures.
    Vowel Recognition can be accomplished by extracting the formant frequencies in
         the user’s speech.

Team member’s contribution:

The successfully developed project consists of six different functions and the
responsibility of developing them, was equally shared by both the participants.

Asmita Akerkar:

Modules developed:

   1. Normalize Function: Normalizes the signal
   2. PSD Function: To determine the power spectral density
   3. Project Documentation
Ashwin Philar:

Modules developed:

         1. Block Function: Determines the block with maximum power
         2. Peak Function: Determines the formants in the PSD
         3. Power Point presentation development.

We agreed on a common strategy of calling the other functions from the main function
and finally we developed the minimum Euclidean distance formula for decision-making.

The task of recording the sound samples, testing the application, and debugging was
carried out.

Possible Future Work:

    Increase the capability of the software to detect vowels spoken by different
    To device a method to detect vowels in words.
    For detecting vowels in different languages.


To top