Embed
Email

poster

Document Sample

Shared by: linzhengnd
Categories
Tags
Stats
views:
2
posted:
11/8/2011
language:
English
pages:
1
Towards improved proper name recognition

Bert Réveil and Jean-Pierre Martens

DSSP group, Ghent University, Department of Electronics and Information Systems

Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium

{breveil,martens}@elis.ugent.be





Topic description

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------





Automatic proper name recognition is a key component of multiple speech-based applications (e.g. voice-driven navigation systems). This recognition is challenged by the

mismatch between the way the names are represented in the recognizer and the way they are actually pronounced:

Please guide me

• Incorrect phonemic name transcriptions: common grapheme-to-phoneme (G2P) RECOGNITION SYSTEM GPS towards ‘A&u.stIn

converters can‟t cope with archaic spelling and foreign name parts, manual HMMs Lexicon

transcriptions are too costly (e.g. Ugchelsegrensweg, Haînautlaan) “O”



• Multiple plausible name pronunciations: within or across languages (e.g. Roger) Austin 'O.stIn

• Cross-lingual pronunciation variation: foreign names, foreign application users … …



In order to improve the phonemic transcriptions and capture the pronunciation variation we adopt acoustic and lexical modeling approaches. Acoustic modeling targets a

better modeling of the expected utterance sounds. Lexical modeling tries to foresee the most plausible phonemic transcription(s) for each name in the recognition lexicon.





Experimental set-up Acoustic and lexical modeling strategies

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------





Database: Autonomata Spoken Name Corpus (ASNC) The modeling approaches are firstly conceived for the primary targeted users, also

• 120 Dutch, 40 English, 20 French, 40 Moroccan and 20 Turkish speakers called the native (NAT) users (in our case Dutch natives). W.r.t. these users, two

• Every speaker reads 181 names with either Dutch, English, French, Moroccan or types of non-native languages are distinguished: foreign languages that most NAT

Turkish origin speakers are familiar with (NN1), and other foreign languages (NN2).

• Non-overlapping train and test set (disjunctive names, speakers)

• Human expert transcriptions

Strategy 1: Incorporating NN1 language knowledge

- TY: typical Dutch transcription (one for each name from TeleAtlas) • Acoustic modeling: two model sets

- AV: auditory verified Dutch transcription (one for each name utterance) - AC-MONO : standard NAT Dutch model (trained on Dutch speech alone)

This work: only Dutch native utterances + non-native utterances of Dutch names - AC-MULTI : Dutch (20%) and NN1 training data (English, French and German)

Table 1: Number of utterances for all (speaker,name) pairs in train and test set Lexical modeling

Set DU EN FR MO TU - G2P transcribers for NAT and NN1 languages (Nuance RealSpeak TTS)

(DU,*) train 9960 1909 966 1245 943  Foreign transcriptions are nativized in combination with AC-MONO

test 4440 851 414 555 437 - Data-driven selection of one extra G2P converter per name origin

(*,DU) train 9960 3000 1680 3360 1560

Strategy 2: Creating pronunciation variants (lexical modeling)

test 4440 1800 720 1440 840

- Computed per (speaker, name) combination

Speech recognizer: state-of-the-art VoCon 3200 from Nuance - Created from initial G2P transcriptions by means of automatically learned

• Grammar: name loop with 21K different names (3.5K names of ASNC + 17.5K others) phoneme-to-phoneme (P2P) converters





Construction of phoneme-to-phoneme converters

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------





P2P learning requires the orthographic transcription, an initial G2P transcription and a target phonemic transcription (e.g. TY or AV) of a sufficiently large collection of

name utterances. These 3-tuples are supplied to a 4 step training procedure:

High level Initial Target

• Two-fold alignment: Orthography ↔ Initial transcription ↔ Target transcription features

Orthography

transcription transcription



~ D i r k () V a n () D e n () ~ B o ~ ssch e

„ d I r K _ f A n _ d E n _ „ b O . s $ Alignment process Alignment process

„ d i r k _ v A n _ d $ m _ ~ b O . s $ (letter-to-sound) (sound-to-sound)



• Transformation retrieval

• Generation of training examples: describe linguistic context

Transformation

 Previous and next phonemes and graphemes learning

Learn morphological

 Lexical context (Part Of Speech) classes

 Prosodic context (stressed syllable or not)

 Morphological context (word prefix/suffix)

 External features: e.g. name type, name source, speaker tongue Example generation

• Rule induction

 Learn decision tree per input (pattern): stochastic rules in leaf nodes

 Rule formalism: if context → leaf node then [input pattern] → [output pattern] with probability Pfir Stochastic rule induction



In generation mode: rules applied to initial G2P transcription of unseen name  variants with probabilities



Experimental assessment

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------





Incorporating NN1 language knowledge Table 2: Name Error Rate (%) for systems with G2P lexicons

• Including extra G2P transcriptions (acoustic model = AC-MONO) (spkr,name) System DU EN FR MO TU



- Boost for (DU,-DU): NAT speakers use NN1 knowledge when (DU,*) AC-MONO + DUN G2P 6.5 38.5 21.3 14.6 28.4



reading foreign names, including NN2 names AC-MONO + 4G2P (nativized) 7.2 22.7 9.9 9.5 17.2

- Degradation for (DU,DU): reduced by selecting only one extra G2P AC-MONO + G2P-selection (nativized) 6.5 20.8 7.2 9.0 18.1

• Decoding with multilingual acoustic model AC-MULTI + G2P-selection (nativized) 8.5 14.9 7.2 8.3 16.2

- NAT speakers: loss for NAT names, boost for English names only AC-MULTI + G2P-selection (plain) 8.5 14.0 7.7 8.6 18.1

 Dutch sounds not as well modeled as before (*,DU) AC-MONO + DUN G2P 6.5 25.1 33.2 26.9 40.8

 English better known than French? AC-MONO + 4G2P (nativized) 7.2 22.8 32.2 27.0 40.6

 English and Dutch sound inventories differ more than French and Dutch?

AC-MONO + G2P-selection (nativized) 6.5 22.8 31.1 25.3 38.5

- Foreign speakers: boost for both NN1 name origins

AC-MULTI + G2P-selection (nativized) 8.5 17.6 22.6 25.2 38.6

- mother tongue sounds better modeled

AC-MULTI + G2P-selection (plain) 8.5 18.2 22.6 25.8 40.4

• Plain multilingual G2P transcriptions bring no improvement

Creating pronunciation variants Table 3: Name Error Rate (%) for systems with P2P transcription variants

(spkr,name) System DU EN FR MO TU

• Baseline P2Ps: Dutch G2P transcriptions as initials, AV transcriptions as targets (DU,*) AC-MULTI + G2P-selection (nativized) 8.5 14.9 7.2 8.3 16.2

- Alternative P2Ps for (DU,NN1) and (NN1,DU) cells + 4 P2P variants (baseline) 7.7 13.2 6.3 7.0 11.9

- create additional P2P that starts from NN1 G2P transcriptions + 4 P2P variants (alternative) 7.7 12.2 6.3 7.0 11.9

- combine most probable variants generated by both P2P converters (*,DU) AC-MULTI + G2P-selection (nativized) 8.5 17.6 22.6 25.2 38.6

• P2P variants lead to significant improvements for all (speaker, name) cells + 4 P2P variants (baseline) 7.7 17.2 19.9 24.0 35.2



- 10 .. 25% relative for NAT + foreign names , 5 .. 17% for foreign speakers + 4 P2P variants (alternative) 7.7 16.4 18.8 24.0 35.2









Acknowledgments References

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------



The presented work was carried out in the Autonomata TOO project, granted under the Dutch-Flemish STEVIN [1] B. Réveil, J.-P. Martens and B. D‟hoore, How speaker tongue and name source language affect the automatic

program (http://taalunieversum.org/taal/technologie/stevin/), with partners RU Nijmegen, Universiteit Utrecht, recognition of spoken names, in Proc. InterSpeech 2009, UK, Brighton

Nuance and TeleAtlas. [2] H. van den Heuvel, B. Réveil and J.-P. Martens, Pronunciation-based ASR for names, in Proc. InterSpeech

2009, UK, Brighton

[3] B. Réveil, J.-P. Martens and H. van den Heuvel, Improving proper name recognition by adding automatically

learned pronunciation variants to the lexicon, in Proc. LREC 2010, Valletta, Malta



Related docs
Other docs by linzhengnd
option strategy excel spreadsheet
Views: 3  |  Downloads: 0
Tips on Effective Listening
Views: 0  |  Downloads: 0
TO DOWNLOAD TEXT - Repairing The Breach
Views: 0  |  Downloads: 0
Power-Up Tested - Access Mobile
Views: 4  |  Downloads: 0
6502 Sell stone monuments and memorials
Views: 0  |  Downloads: 0
Sheet1 - Atlanta International School
Views: 2  |  Downloads: 0
AFRICAN UNION
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!