Embed
Email

AUTOMATIC PRONUNCIATION VERIFICATION OF ENGLISH LETTER-NAMES FOR

Document Sample

Shared by: dffhrtcv3
Categories
Tags
Stats
views:
0
posted:
1/6/2012
language:
pages:
4
AUTOMATIC PRONUNCIATION VERIFICATION OF ENGLISH LETTER-NAMES

FOR EARLY LITERACY ASSESSMENT OF PRELITERATE CHILDREN



Matthew Black, Joseph Tepperman, Abe Kazemzadeh, Sungbok Lee, and Shrikanth Narayanan



Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA

{matthepb,tepperma,kazemzad,sungbokl}@usc.edu, shri@sipi.usc.edu





ABSTRACT reduced to one of letter-name recognition. That is, we are

not interested in specifying which letter-name the child said,

Children need to master reading letter-names and letter-

but rather whether the letter-name pronunciation was read

sounds before reading phrases and sentences. Pronunciation

acceptably. In most letter-name recognition research (an

assessment of letter-names and letter-sounds read aloud is

application that arises, for example, when a person spells

an important component of preliterate children’s education,

aloud an out-of-vocabulary word), the intended letter is not

and automating this process can have several advantages.

known ahead of time, but the assumption is that it is spoken

The goal of this work was to automatically verify letter-

correctly [3-6]. For this paper, we know what letter-name

names spoken by kindergarteners and first graders in

the child was prompted to say. The difficulty is robustly

realistic classroom noise conditions. We applied the same

detecting the innumerable ways a child could produce an

techniques developed in our previous work on automatic

unacceptable pronunciation, while not penalizing a child for

letter-sound verification by comparing and optimizing

acceptable pronunciation variations (such as nonnative

different acoustic models, dictionaries, and decoding

accent).

grammars. Our final system was unbiased with respect to

There are numerous engineering challenges in automatic

the child’s grade, age, and native language and achieved

letter-name verification for children. Children’s speech has

93.1% agreement (0.813 kappa agreement) with human

high variability within and between speakers [7], and the

evaluators, who agreed among themselves 95.4% of the

data used in this research was collected in noisy classrooms

time (0.891 kappa).

from children with multiple language backgrounds. These

Index Terms— Children’s speech, pronunciation conditions make it difficult to train representative acoustic

verification, automatic reading assessment, letter-names models. Furthermore, many of the letter-names are

acoustically similar (e.g., /eh m/ and /eh n/), and almost all

1. INTRODUCTION of them share at least a common phoneme (e.g., /b iy/, /c iy/,

/d iy/, /iy/, /jh iy/, /p iy/, /t iy/, /v iy/, and /z iy/). In addition,

Children’s future reading proficiency and their ability to

there is no word or letter context for this isolated letter-

learn effectively through reading has been shown to be

name reading task, so we cannot train language models, as

correlated with the mastery of reading the names of the

is typically done in letter-name recognition tasks when the

letters (letter-names) and producing the sounds of the letters

speaker is spelling real words [4].

(letter-sounds) at an early age [1]. Assessing children’s

We experimented with different acoustic models,

skills in these reading tasks is an important element of early

dictionaries, and decoding grammars with the goal of

education to confirm that the children are learning.

attaining automatic letter-name verification with accuracy

Automatic assessment of letter-sounds and letter-names

that neared human agreement. Section 2 describes the data

can have several advantages. The personalized assessment

we analyzed. Section 3 briefly describes our verification

required to properly score a child’s reading level takes one-

method, which builds upon our previous work on automatic

on-one time, which a teacher may not always be able to

letter-sound verification [2]. Section 4 shows the

provide. Furthermore, an automatic system may remove

experimental results, with a discussion following in Section

some of the personal biases inherent in the judgment of the

5, including an in-depth error analysis and comparison to

child’s reading level and standardize the grading process.

the letter-sound task and results. We conclude in Section 6.

In addition, automatic systems can provide teachers with a

fine-grained analysis of the child’s pronunciation, offering

2. CORPUS

them insight for future instructional planning.

This paper concentrates on automatically verifying We used data from the Technology-based assessment of

letter-names spoken by preliterate children, complementing language and literacy (Tball) Project [8,9]. The Tball

our previous work addressing the letter-sound task [2]. corpus [10] was recorded in kindergarten to second grade

Please note that the letter-name verification task is not classrooms in the greater Los Angeles area. Typical

background noise included speech from other children and were trained on 12 hours of isolated word-reading data

the teacher. The corpus contains both native English and (without letter-names), also recorded for the Tball Project.

Spanish speakers; thus, we can expect certain pronunciation A background model was trained on silent and background

trends, as described in [11]. All 26 English alphabet noise portions of the utterances, and a single generic phone-

characters were tested for the letter-name reading task. One level “garbage” model was trained on all speech segments.

lowercase letter was displayed on a computer screen for a Five sets of acoustic models were iteratively trained directly

maximum of five seconds before the next letter was shown. on the letter-name train set, as described in [2]. All feature

These transition times were automatically recorded and used extraction and model training was done with HTK [12].

to segment the files into single letter-name utterances.

We manually verified (accept/reject) 3508 letter-name 3.2. Dictionaries

utterances, of which 25.1% were rejected. 23.4% of these

A recognition dictionary that included all the acceptable

rejected utterances were due to the child saying nothing.

letter-name pronunciations served as a baseline dictionary.

8.27% of all the utterances were marked as having at least

This dictionary was not ideal since it did not take into

one disfluency (fillers, repetitions, and/or repairs). Table 1

account the fact that we knew what letter-name the child

shows performance across various demographics that were

was prompted to say. For this reason, we also constructed

provided for some of the children. Using the manual

five additional dictionaries that included unacceptable letter-

annotations, we created a test set with 780 files (30 files per

name pronunciations from foreseeable categorical errors

letter-name) and a train set with the remaining 2728 files

(Table 2). We then produced 32 sets of verification

(approximately 105 files per letter-name). The data were

dictionaries through all 25 combinations of the five

partitioned so that the proportion of acceptable to

categories (none, LS, PE, SI, …, LS-PE, LS-SI, …, all).

unacceptable pronunciations was the same between the train

Each dictionary set contained a dictionary for each letter

and test set for each letter-name. To compute human

with acceptable letter-name pronunciations and appropriate

agreement statistics, three trained native speakers verified

unacceptable ones. We refer to the verification dictionary

the same 260 files (10 files per letter-name), randomly

set that did not include any unacceptable pronunciations as

selected from the test set. Mean pairwise evaluator

the “none” set, and the one that included all the

agreement was 95.38%, with kappa agreement of 0.8914.

unacceptable pronunciations as the “all” set.

Demographic Number % Accepted

Label Description # of Entries Examples

Female 1820 72.36

Gender LS English letter-sounds 45 v: /v/, /v ah/

Male 1582 77.81

PE Perceptual confusions 43 m-n, f-s, c-z

K 3012 75.13

Grade SI Sight confusions 21 b-d, p-q, o-c

1st 420 70.48

SP Spanish confusions 14 j: /hh ey/

5 1897 78.12

Age SPLN Spanish letter-names 28 d: /d ey/

6 556 76.80

Native Spanish 1203 72.98 Table 2. Description of the five unacceptable pronunciation

Language English 1151 82.71 categories, with the corresponding number of entries and examples

Table 1. Children performance (based on manual verification)

across various demographics. Bold numbers indicate the 3.3. Grammars

difference in proportion is statistically significant (p 0.1).

However, the mean SNR for utterances in which the system optimized for each letter separately. Our final automatic

erred (disagreed with the manual verification) was system agreed with humans 93.1% of the time (0.813

significantly lower than the mean SNR for utterances in kappa), nearing inter-evaluator agreement of 95.4% (0.891

which the system was correct (p<0.01). This implies that kappa), and was unbiased with respect to the child’s grade,

noise did not affect human evaluator agreement but age, and native language. This system also performed

adversely affected automatic verification performance. significantly better than the one we previously developed to

verify the more difficult letter-sounds [2]. In the future, we

S



SNR = 10 log 10

1

S ∑ s =1

As2

(1)

want to improve system performance in the presence of

noise through improved acoustic modeling and/or by

N

1

N ∑ n =1

2

An automatically detecting when there is too much background

noise to reliably verify the utterance.

# in test SNR Statistics [dB]

Partition of test data 7. ACKNOWLEDGEMENTS

data mean std. dev.

agree 193 9.335 3.712 This works was supported in part by the National Science

Inter-evaluator

disagree 33 8.632 3.292 Foundation. Special thanks to Matthew Tan and Isaac

correct 648 9.623 3.533

System Rottman for their help in transcribing the letter-name data.

error 42 7.810 3.796

Table 6. SNR statistics comparing the effect of noise on inter- 8. REFERENCES

evaluator agreement and system performance. Bold numbers

means the difference in means is statistically significant (p<0.01). [1] National Reading Panel, “Teaching children to read: an

evidence-based assessment of the scientific research literature

5.2. Comparison between letter-names and letter-sounds on reading and its implications for reading instruction,”

NICHD, NIH Publication 00-4769, Washington, DC, 2000.

According to the manual verification, children performed [2] M. Black, J. Tepperman, A. Kazemzadeh, S. Lee, and S.

significantly better on the letter-name task (74.9% accepted) Narayanan, “Pronunciation verification of English letter-

than the letter-sound task (72.2% accepted), with p<0.05. sounds in preliterate children,” Proc. Interspeech, Antwerp,

This is probably because all letter-names have a one-to-one Belgium, 2007.

mapping for their pronunciations, while many of the letter- [3] M. Fanty and R.A. Cole, “Spoken letter recognition,”

sounds have alternative pronunciations depending on word Advances in Neural Information Processing Systems 3, San

context. The letter-sounds are also shorter and less natural Mateo, CA: Morgan Kaufmann, 1991.

to pronounce aloud, which may have been a factor in the [4] H. Hild and A. Waibel, “Recognition of spelled names over the

letter-sounds having twice as many disfluencies (16.9%), a telephone,” Proc. ICSLP, Philadelphia, PA, 1996.

significant difference with p<0.05. Human agreement [5] P.C. Loizou and A.S. Spanias, “High performance alphabet

statistics for both tasks were nearly identical. recognition,” IEEE Trans. Speech and Audio Processing,

We found the same trends in our automatic verification 4(6):430-445, 1996.

performance for both the letter-name and letter-sound tasks, [6] M.E. Munich and Q. Lin, “Explicit modeling of common

in that the baseline models were worse than models trained acoustic features for character recognition,” Proc. EUSIPCO,

on in-domain data, with grammar 2 and the letter-specific Vienna, Austria, 2004.

dictionary providing the best results. English letter-name [7] S. Lee, A. Potamianos, and S. Narayanan, “Acoustics of

substitutions and alternative pronunciations were the most children's speech: developmental changes of temporal and

common categorical errors for the letter-sound task, with spectral parameters,” J. of Acoust. Soc. Am., 105:1455-1468,

sight confusions and Spanish letter-name errors dominating Mar. 1999.

the letter-name task. Overall, we attained higher [8] Tball. http://diana.icsl.ucla.edu/Tball/assess_frame.html.

verification accuracy on the letter-name task (93.08% [9] A. Alwan et al., “A system for technology based assessment of

accuracy), compared to the letter-sound task (87.95% language and literacy in young children: the role of multiple

accuracy), with p<0.01. We feel this difference is mostly information sources,” Proc. MMSP, Greece, 2007.

due to the acoustic models. Whereas HMMs using MFCC [10] A. Kazemzadeh, H. You, M. Iseli, B. Jones, X. Cui, M.

features model letter-name phonemes well, they seem to be Heritage, P. Price, E. Anderson, S. Narayanan, and A. Alwan,

less suited for the more noise-like letter-sounds. Future “Tball data collection: the making of a young children's speech

research on letter-sound specific features will hopefully help corpus,” Proc. Eurospeech, Lisbon, Portugal, 2005.

bridge this gap. [11] H. You, A. Alwan, A. Kazemzadeh, and S. Narayanan,

“Pronunciation variations of Spanish-accented English spoken

6. CONCLUSION by young children,” Proc. Eurospeech, Lisbon, Portugal, 2005.

[12] Cambridge University, HTK 3.2, htk.eng.cam.ac.uk.

We showed that we could accurately verify letter-name

pronunciations through acoustic modeling at the phoneme-

level. We achieved the best results using a dictionary



Related docs
Other docs by dffhrtcv3
Chromosomal Miss-Segregation and DNA Damage
Views: 23  |  Downloads: 0
Christmas
Views: 21  |  Downloads: 0
Christmas Party Counting
Views: 19  |  Downloads: 0
Christmas dishes
Views: 19  |  Downloads: 0
CHRISTIAS FOR BIBLICAL ISRAEL or CFBI
Views: 20  |  Downloads: 0
Christian Ethics Living a Responsible Life
Views: 20  |  Downloads: 0
Christian Duty - Seymour Church of Christ
Views: 20  |  Downloads: 0
Chp 9 Power Point 08-09
Views: 19  |  Downloads: 0
Choose Your Own Adventure 2
Views: 20  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!