Docstoc

Bridging-the-Gap__20leagan_20idirl_EDn_20Samh_2007_Bilingual

Document Sample
Bridging-the-Gap__20leagan_20idirl_EDn_20Samh_2007_Bilingual Powered By Docstoc
					     Bridging the Gap
Forbairt Acmhainní Urlabhra
                Ailbhe Ní Chasaide

             Phonetics and Speech Laboratory
School of Linguistic, Speech and Communication Sciences
                   Trinity College Dublin
     Bridging the Gap
Forbairt Acmhainní Urlabhra
                Ailbhe Ní Chasaide

        An Saotharlann Foghraíochta & Urlabhra
School of Linguistic, Speech and Communication Sciences
                Coláiste na Tríonóide, BÁC
Introduction: speech communication technology

There is a major deficit in this area for Irish


Bridging the gap: research is presented here, ongoing for
the last few years, which is directed at rectifying this deficit.
It aims specifically to develop
 speech/linguistic resources for Irish
 text-to-speech synthesis

Research carried out in the following projects:
                        •     1 WISPR       (2004 - 2005) EU
                           Interreg
    •   2 Cabóigín I (2006 – 2007) Foras na Gaeilge
Réamhra: Teicneolaíocht Chumarsáid Urlabhra

Tá bearna mór sa réimse seo don Ghaeilge

Líonadh an bhearna: Curtear síos anseo ar thaighde, atá
   ar bun sa Saotharlann le cúpla bliain anuas – tús le
   feachtas chun an bearna seo a líonadh. Tá muid ag
   forbairt:

•      acmhainní urlabhra/teangeolaíochta don Ghaeilge
•      sintéis téacs-go-hurlabhra

Taighde ar siúl sna tionscadail:
•      1 WISPR         (2004 - 2005) AE Interreg
•      2 Cabóigín I (2006 – 2007) Foras na Gaeilge
Reporting here on research carried out in the
  following projects:
1 WISPR Welsh and Irish Speech Processing Resources
Funded by EU INTERREG Community Initiatives Programme




         Wales
                                 Ireland
         Bangor
                                   TCD
          CB



                        DCU        UCD       ITÉ
Baineann an taighde atá muid ag léiriú anseo leis na
  tionscadail seo a leanas:

1 WISPR Acmhainní Próiseáil Urlabhra na Gaeilge is
  na Breatnaise
• Maoiniú ón AE: INTERREG IIIA Ireland/Wales Programme


An Bhreatain Bheag
                                    Éire
     Bangor
                                    TCD
       CB



                         DCU        UCD        ITÉ
1 WISPR
Welsh and Irish Speech Processing Resources
Goals:

• Irish
Given a lack of prior resources, focus was primarily on
developing a corpus and to begin development of certain
resources…… a long term perspective adopted

• Welsh
Better resources in place, including an early t-t-s system.
Therefore focus was on perfecting and extending it, and on
development of tts applications
1 WISPR
Acmhainní Próiseáil Urlabhra na Gaeilge is na
Breatnaise

Cuspóirí:

• Gaeilge
Easpa acmhainní: Béim ar fhorbairt chorpais & cuid de na
hacmhainní riachtanacha: Cur chuige fadtéarmach

• An Bhreatnais
Achmainní níos fearr ar fáil, gléas sintéise ina measc: Béim ar
leathnú amach, feabhsú agus cuir i bhfeidhm na sintéise
and…
2 Cabóigín I
                 Funded by:
Goal
• To develop a full, working text-to-speech system
  for Gaoth-Dobhair Irish, building on the WISPR
  corpus

•   To develop fully the requisite processing
    modules for this dialect
agus…

2 Cabóigín I
                  Maoiniú:

Cuspóir

• Ríomhchainteoir iomlán a thógáil do chanúint
Ghaoth Dobhair, bunaithe ar chorpas WISPR

• Forbairt iomlán a dhéanamh ar na hacmhainní
próiseála ar a bhfuil an ríomhchaint ag brath don
chanúint seo
Input also from the project




3 Prosody of Irish Dialects
Funded by: The Irish Research Council for
  the Humanities and Social Sciences

   This project is providing information on
     prosody which we can use in building the
     synthesis system
Ionchur fosta ón dtionscadal




3 Prosóid Chanúintí na Gaeilge:
Maoiniú: An Chomhairle um Thaighde sna
 Dána agus sna hEolaíochtaí Sóisialta


   Ag soláthar ionchur ar phrosóid, riachtanach
    don ríomhchainteoir
Introduction
Speech communication technology plays an increasing role
  in              -Education
                    -Public access to information
                    -Disability: access and communication

Developed for English and a handful of world‟s languages.
  Driven by         -Commercial interests
                    -National research initiatives

Particularly important to the maintenance and preservation
   of lesser used languages,….but,

The widening gap in provision for lesser used languages
  needs to be addressed if they are to harness the power of
  the new technologies
Réamhrá
Bíonn ról ag teicneolaíocht urlabhra in...
                     -Oideachas
                     -Teacht ar eolas
                     -Míchumas: rochtain agus cumarsáid

Forbartha don Bhéarla agus roinnt teangacha eile...
  Forbartha ag       -Comhlachtaí
                     -Tionscnaimh taighde náisiúnta

Tábhachtach maidir le cothú agus caomhnú teangacha
  neamhfhorleathana,….ach,

Caithfear deighleáil leis an bhearna maidir le soláthar
   teicneolaíochtaí nua-aimseartha do teangacha
   neamhfhorleathana le feidhm a bhaint astu
Introduction

Speech technology has “matured” so that it is easier than
  ever to develop new systems


So, in principle systems such as text-to-speech synthesis can
   be readily developed in a “new” language.


In practice, not so easy
   require a plethora of prior resources
   can be further hampered by local constraints
Réamhrá

Tá sé níos éasca anois ná mar a bhí riamh coráis nua
  teicneolaíocht urlabhra a chur le chéile


Ba cheart mar sin go bhféadfaí córas ar nós gléas sintéise a
  chur le chéile go héasca le haghaidh teanga “nua”.


Dáiríre, níl sé chomh héasca
  ...bíonn gá le méid mór acmhainní riachtanacha
  ...go minic bíonn constaiceanna áitiúla sa bhealach fosta
Introduction

Our specific experience may be helpful to others who will
  need to go down a similar path


Many of the challenges we face are common to all the lesser
  used/under resourced languages
Réamhra

B‟fhéidir go gcabhróidh ár dtaithí le daoine eile a bheidh ag
   iarraidh na rudaí céanna á fhorbairt amach anseo


Baineann cuid mhór de na deachrachtaí atá os ár gcomhair le
   teangacha neamhfhorleathana i gcoitinne
Introduction:        this talk will discuss

1 Challenges

2 Ways to meet the challenges

3 Applications
             in education
             disability: access, education & communication

4 Implications: a long-term perspective

5 Demo
Réamhrá:        beidh mé ag plé

1 Deacrachtaí

2 Bealaí chun deighleáil le na deacrachtaí

3 Cur i bhfeidhm
             in oideachas
             míchumas: inrochtaineacht, oideachas &
                 cumarsáid

4 Impleachtaí: fís fhadtéarmach

5 Léiriú/Samplaí
1   Challenges
1   Deacrachtaí
Challenges   The following resources are
              required for synthesis…
                    Text

               Pre-processing

Corpus for                            Corpus for
   unit            Lexicon             diphone
selection                             synthesis
             Letter-to-sound rules
               Prosody model


                   Speech
Deacrachtaí    Tá na hacmhainní seo leanas
                     riachtanach don tsintéis…
                      Téacs

                Réamh-phróiseáil

 Corpas                                 Corpas do
   ‘unit       Foclóir Fuaimnithe        shintéis
selection’                              ‘diphone’
              Rialacha litir-go-fuaim
                     Prosóid


                   Urlabhra
Challenges       ...but are not available for Irish

                    Text

               Pre-processing

Corpus for                              Corpus for
   unit            Lexicon               diphone
selection                               synthesis
             Letter-to-sound rules
               Prosody model


                   Speech
Deacrachtaí      …ach níl siad ar fáil don Ghaeilge

                      Téacs

                Réamh-phróiseáil

 Corpas                                  Corpas do
   ‘unit       Foclóir Fuaimnithe         shintéis
selection’                               ‘diphone’
              Rialacha litir-go-fuaim
                     Prosóid


                   Urlabhra
Challenges          No spoken standard



 3 main dialects:
   Ulster, Connaught & Munster


 Ulster (Donegal) chosen


 From outset, multidialect
  synthesis should be envisaged
Deacrachtaí       Níl caighdeán labhartha ann



  3 mhórchanúint:
    Uladh, Connacht & Mumhan


  Uladh (Dún na nGall)
   roghnaithe


  Ríomhchaint ollchanúnach
   mar sprioc ón tús
Challenges            No spoken standard



 multidialect goals

 have many consequences for
   current research
Deacrachtaí       Easpa caighdeán labhartha



 spriocanna ollchanúnacha

 léir mór impleachtaí do
    thaighde reatha
Challenges: which synthesis technology?

  Unit selection                          Diphone

                                      More limited corpus
   Very large corpus
                                         More planning
      Less planning
                                       More prior analysis
  Less prior analysis

Festival open source system, and Edinburgh Speech Tools used
Deacrachtaí: cén teicneolaíocht sintéise?

  Unit selection                           Diphone

                                          Corpas níos lú
     Corpas ollmhór
                                         Níos mó pleanáil

     Níos lú pleanáil                  Níos mó réamhanailís


 Níos lú réamhanailís

Úsáideadh Festival open source system agus Edinburgh Speech Tools
Challenges:          which synthesis technology?


   Unit selection                  Diphone

 Most natural sounding        Less natural quality

  Can have bad chunks        More consistant quality

Can be slow/less suited to   More suited to certain
    certain applications          applications
Deacrachtaí:      cén teicneolaíocht sintéise?


 Unit selection                 Diphone

     Nádúrtha                Níos mí-nádúrtha


Go dona in áiteanna      Níos comhsheasmhaigh

                       Níos oiriúnaigh d’úsáid áirithe
 Mall...mí-oiriúnach
    d‟úsáid áirithe
Challenges:           Corpora + phonetic annotation


There are none



Consequently, there is no knowledge either on frequency or
distribution of phonemes, diphones, etc…



…….…needed to design a balanced corpus!
Deacrachtaí:            Corpais + anótáil foghraíochta


Níl siad ann



Mar thoradh ar seo, níl eolas ar mhinicíocht nó dáileadh
fóinéimeach, srl…



…….…tá gá le seo le corpas cothrom a dhearadh!
 Challenges:                 Letter-to-Sound rules
                             Complex sound system
 Minimum 55 segments
 How ensure coverage of all combinations in corpus?

                    Irish Consonantal Phonemes
                    LABIAL   DENTAL   ALVEOLAR   ALVEOLO-   PALATAL   VELAR   GLOTTAL
                                                 PALATAL
     PLOSIVE             
                            
                                                                   
                                                                      
                                                                      
                       
                          
                                                  
                                                    
                                                   
                                                   
     FRICATIVE/    
                                                                   
                                                                      
                                                                        
     APPROXIMANT
                   
                                                         
                                                             
                                                              
                                                             


     NASAL
                          
                                                 
                                                                       
     TAP                                    
                                            
                                            
                                            
     LATERAL
     APPROXIMANT
                             
                             
                                                    
                                                       
                                                       
 Deacrachtaí:                Rialacha litir-go-fuaim
                               Córas fuaime coimpléascach
 55 teascán ar a laghad
 Conas gach comhcheangal a chlúdach sa chorpas?

                             Consain na Gaeilge
                   LABIAL   DENTAL   ALVEOLAR   ALVEOLO-   PALATAL   VELAR   GLOTTAL
                                                PALATAL
    PLOSIVE             
                           
                                                                  
                                                                     
                                                                     
                      
                         
                                                 
                                                   
                                                  
                                                  
    FRICATIVE/    
                                                                  
                                                                     
                                                                       
    APPROXIMANT
                  
                                                        
                                                            
                                                             
                                                            


    NASAL
                         
                                                
                                                                      
    TAP                                    
                                           
                                           
                                           
    LATERAL
    APPROXIMANT
                            
                            
                                                   
                                                      
                                                      
Challenges:      Letter-to-Sound rules
                 Complex, archaic writing system



The orthographic form Ní bhfaighfidh „will not get‟
corresponds to the sound string [ nj i: w i :]


Rules consequently complex

Considerable cross-dialect differences
Deacrachtaí:      Rialacha Litir-go-fuaim
                    Seanchóras scríbhneoireachta
                       coimpléascach


Is ionann an foirm orthagrafach Ní bhfaighfidh „ní
   bhfaighfidh‟ agus an sraith fuaime [ nj i: w i :]

Tá na rialacha an-chasta

Tá mórán difríochtaí idir canúintí
Challenges:
            Lexicon
            No dialect-specific pronunciation dictionary

 The existing pronunciation lexicon An Foclóir Póca attempts
 to provide official standard Lárchanúint forms that are a
 compromise between the existing dialects.

 But there are no native speakers of Lárchanúint with which
 we could build a corpus for synthesis!
Deacrachtaí:
            Leicsis
            Easpa foclóir fuaimnithe canúnach

 Tugann an foclóir fuaimnithe An Foclóir Póca foirmeacha atá
 bunaithe ar an Lárchanúint. Is iarracht é seo ar chaighdeán
 labhartha a bhunú as meascán de na canúintí

 Ach níl cainteoirí dúchasacha Lárchanúna ar bith ann, le
 corpus a bhunú don tsintéis!
Challenges:          Code switching



Irish speakers are bilingual and switching between Irish and
      English is prevalent


Need for the Irish synthetic voice to deal with the frequent
    English words and phrases, which may occur in text
Deacrachtaí: Códmhalartú



Bíonn Gaeilgeoirí dátheangach agus is minic códmhalartú idir
     Gaeilge agus Béarla


Is gá go mbeadh ríomhchainteoir Gaeilge ábalta deighleáil le
      focail agus frásaí coitianta i mBéarla
2   Meeting the Challenges
2   Bealaí chun deighleáil leis na
    deacrachtaí
Meeting the challenge
Corpora: Developing annotated speech corpora

A 2-corpus approach taken.

   (Female speaker, Gaoth Dobhair dialect of Donegal)

1 A unit-selection corpus
     15 hours of read speech
     Dialect-specific materials prepared electronically

2 An extended diphone corpus.
     Although a minimalist approach would suggest
     55 phonemes and about 3,000 diphones for Irish, the
      enriched diphone corpus was over 11,500 units.
Ag deighleáil le na deacrachtaí
Corpais: Ag forbairt corpas urlabhra anótáilte

Tá muid ag obair ar 2-chorpas

   (Cainteoir baineann, canúint Gaoth Dobhair i dTír
   Chonaill)

1 Corpas unit-selection
     15 uair a‟ chloig d‟ábhar léite
     Ábhar leictreonach canúnach ullmhaite
2 Corpas diphone saibhir.
     Cé go mbeifeá ag súil le b‟fhéidir 55 fóinéim agus
      3,000 diphone le Ghaeilge, chlúdaíodh breis is 11,500
      aonad sa chorpas diphone saibhir.
Meeting the challenge
Corpora: Developing annotated speech corpora

Together, the joint corpus of c. 20 hours ensures:

   Flexibility for different kinds of T-T-S technologies
       unit selection and diphone synthesis

   Full coverage of sound combinations: no unforeseen gaps

   Future research is enabled to design compact,
      balanced corpora
Ag deighleáil le na deacrachtaí
Corpais: Ag forbairt corpas urlabhra anótáilte

Le chéile, is ionann an dá chorpas le chéile (c. 20 uair):

   Solúbthacht le haghaidh gléasanna sintéise difriúla, .i.
       unit selection agus diphone

   Corpas a chlúdaíonn gach féidearthacht fuaime

   Amach anseo, beidh sé níos éasca corpais bheaga, dlútha
     a dhearadh
Meeting the challenges:
Lexicon: Appropriate to Donegal dialect
• Using a part of the corpus (1 hour of carefully transcribed
  speech), a mini Donegal lexicon was manually produced

• The forms from the mini-lexicon were compared to those
  of An Foclóir Póca, to develop sound-to-sound rules

• The output rules were applied to An Foclóir Póca to
  produce a draft, dialect-specific, version

• This was then manually checked, and used to segment
  next chunk of corpus….

• Process repeated cyclically
Ag deighleáil le na deacrachtaí:
Leicsis:    Oiriúnach do chanúint Dhún na nGall
• Ag úsáid cuid den chorpas (1 uair d‟urlabhra trascríofa go
  cúramach), cruthaíodh mionleicsis do Dhún na nGall

• Cruthaíodh rialacha fuaim-go-fuaim trí comparáid a
  dhéanamh idir an mionleicsis agus An Foclóir Póca

• Úsáideadh na rialacha seo ar An Foclóir Póca le
  dréachtleicsis canúnach a chruthú

• Ceartaíodh é seo le lámh ag úsáideadh é agus an chéad
  chuid eile den chorpas go huathoibríoch….

• Leanadh leis an phróiseas
Meeting the challenges:
Letter-to-sound rules

Two distinct approaches were tried:

    1. Statistically based rules were generated using the
       Donegal-adapted lexicon and standard training tools.

    1. Handwritten rules were encoded
           -based on An Foclóir Póca, with
           -subsequent removal, re-ordering, & additions
Ag deighleáil le na deachrachtaí:
Rialacha litir-go-fuaim

Baineadh triail as dhá mhodh oibre difriúil:

    1. Cruthaíodh rialacha go huathoibríodh ag úsáid
       gnáthmhodh staitisticiúil ar leicsis canúnach Dhún na
       nGall.

    1. Códaíodh rialacha lámhscríofa
            -bunaith ar An Foclóir Póca, le
            -roinnt athruithe don chanúint: baint amach,
            atheagrú, & rialacha sa bhreis
Meeting the challenges:
Letter-to-sound rules
Handwritten rules (2) best:

gave considerably better results

outputs rules that are readily adapted to further dialects

aims to provide LTS rules, formulated in terms of

    •   common core (of universally applicable rules)
    •   + dialect-specific modules

    (A linguistic concept proposed by Ó Murchú, Ó Baoill)
Ag deighleáil le na deacrachtaí:
Rialacha litir-go-fuaim
Rialacha lámhscríofa (2) is fearr:

torthaí i bhfad níos fearr

rialacha a athrófar go héasca go canúintí eile

ag iarraidh rialacha LTS a sholáthar bunaithe

    •   rialacha comhchanúnacha
    •   + rialacha canúnacha

    (Coinceap teangeolaíochta a bhí curtha i láthair ag Ó
       Murchú, Ó Baoill…)
Meeting the challenges:
Prosody
A prosody gap in our knowledge of Irish linguistic structure


This is being tackled in a parallel project, mentioned above:
   Prosody of Irish Dialects, (funded by the Irish Research
   Council for Humanities & Social Sciences)


Linguistic analysis will serve as basis to build model for
   synthesis
Ag deighleáil le na deacrachtaí:
Prosóid
Bearna na prosóide inár dtuiscint ar structúr teangeolaíochta
   na Gaeilge


Táimid ag plé le seo i dtionscadal eile Prosóid Chanúintí na
   Gaeilge, (á maoiniú ag An Chomhairle um Thaighde sna
   Dána agus sna hEolaíochtaí Sóisialta)


Úsáidfear an anailís teangeolaíoch i mbunú aonaid prosóide
   don ríomhchaint
Meeting the challenges:
Code switching
For practical reasons, the Irish synthesiser should be able to
„speak‟ words and phrases of English that appear in texts.


For now: a parallel, Irish-English unit-selection synthesiser is
being built using the same speaker (using a compact corpus
designed to yield coverage of the sounds of English.


This voice „speaks‟ Donegal Irish-English and will eventually
be integrated with the Irish voice.
Ag deighleáil le na deacrachtaí:
Códmhalartú
Go praiticiúil, ba cheart go mbeadh an ríomhchainteoir ábalta
focail Bhéarla a „rá‟.


Idir an dá linn, úsáidfear ríomhchainteoir Béarla bunaithe ar
an nguth céanna leis an ríomhchainteoir Gaeilge.


Labhraíonn an guth seo Béarla Dhún na nGall agus amach
anseo déanfar é a chur leis an ghuth Gaeilge.
3   Applications in education

    in education

    disability :access, education &
                 communication
3   Cur i bhfeidhm

    in oideachas

    inrochtaineacht, oideachas &
        cumarsáid
Applications:               Education
Development of teaching materials & aids
Integrated voice output in Irish for:

•   internet and DVD materials

•   teaching software programmes, interactive videos –
    useful for native speakers and learners of Irish

•   speaking books

•   online learning and exercise materials

•   spell-checkers, dictionaries

•   learners with reading or learning problems, synthesis-
    based aids are particularly important
Cuir i bhfeidhm:                 Oideachas
Forbairt ábhar múinteoireachta
Aschur urlabhra imeasctha i nGaeilge le haghaidh:

•   ábhar idirlín & DVD

•   bogearraí oideachasúla & físeáin idirghníomhacha –
    úsáideach do chanteoirí dúchasacha agus foghlaimeoirí

•   ríomhleabhair

•   ábhar foghlama ar líne

•   litreoirí, foclóirí

•   ríomhchainteoirí an-tábhachtach le haghaidh foghlaimeoirí
    le deacrachtaí léamh nó foghlama
Applications:             Education
Development of teaching materials & aids

The other resources are also useful

Annotated corpora (with transcriptions, part of speech
information etc) can be exploited to
     • devise interactive teaching materials
     • for research on Irish (such quantitative data not
       hitherto available)

The pronunciation lexicon and letter-to-sound rules
can all be integrated into specific learner materials
Cuir i bhfeidhm:                 Oideachas
Forbairt ábhar múinteoireachta

Tá na hacmhainn eile úsáideadh freisin

Corpais Anótáilte (le trascríbhinní, eolas mar gheall ar
ranna cainte srl) a úsáid le
    • ábhar múinteoireachta idirghníomhach a chruthú
    • taighde a dhéanamh ar an Ghaeilge

Is féidir an foclóir fuaimnithe agus na rialacha litir-go-
fuaim a chur isteach in ábhar foghlama ar leith
Applications:             Education
Learners and teachers (individuals in school & University)

•   All computer texts can be spoken out, internet,
    newspapers, email: this aids development of reading,
    listening and pronunciation skills

•   When writing (on computer) listening to spoken output
    facilitates error detection

•   Parents: support for non-native speaker parents, often
    uncertainty with pronunciation when helping with
    homework (daunted by the complex orthography)

•   Native speaker models of the language in a classroom
    where even teachers may have non-native pronunciation
Cuir i bhfeidhm:                   Oideachas
Foghlaimeoirí & múinteoirí (ar scoil & ar an Ollscoil)

•   Is féidir téacsanna leictreonacha a léamh amach, idirlíon,
    nuachtáin, r-phost: cabhródh sé seo le scileanna
    léitheoireachta, éisteachta agus fuaimnithe

•   Ag úsáid ríomhchainteoir le ceartúcháin a dhéanamh ar a
    bhfuil clóscríofa

•   Tuistí: ag cabhrú le tuistí nach bhfuil Gaeilge acu agus iad
    ag cabhrú lena bpáistí le hobair bhaile srl.

•   Fuaimniú dúchasach sa seomra ranga fiú muna bhfuil
    Gaeilge ó dhúchas ag an mhúinteoir
Applications: Disability
              access & education
The visually disabled

•   Synthesis-based screen readers are crucial in the
    education process. The option of pursuing an education
    through Irish depends on this provision.

•   Access to any computer-based materials in Irish, e.g. on
    the Internet is currently not available. This is particularly
    important in the life of the visually disabled, who often
    pursue careers in computing.
Cuir i bhfeidhm: Míchumas
                 rochtain & oideachas
Daoine faoi mhíchumas radhairc

•   Braitheann oideachas trí Ghaeilge do dhaoine faoi
    mhíchumas radhairc ar léitheoirí scáileáin Gaeilge a bheith
    ar fáil le ríomhchainteoir Gaeilge.

•   Níl inrochtaineacht ar fáil ar ríomhairí do Ghaeilgeoirí atá
    faoi mhíchumas radhairc.
Applications: Disability
              access & education


The vocally disabled

Synthesis is a fundamental communication requirement, vital
to the education process, and currently not available in Irish
Cuir i bhfeidhm: Míchumas
                 rochtain & oideachas


Daoine faoi mhíchumas labhartha


Tá ríomhchaint riachtanach do dhaoine atá faoi mhíchumas
labhartha. Is gléas cumarsáide bunúsach é atá lárnach fosta
le haghaidh oideachais, agus níl sé seo ar fáil don Ghaeilge
go fóill
4   Implications: a long term
                  perspective
4   Impleachtaí: fís fhadtéarmach
Implications:       developing speech resources


Importance of forward planning

Define goals and eventual use, e.g.,

•   need for multi-dialect synthesis

•   need to cater for code switching (an English voice is easy
    to include now, but cannot be added in later)
Impleachtaí:       Ag forbairt acmhainní urlabhra


Tábhacht le réamhphleanáil

Déan cur síos ar chuspóirí agus úsáid, m.sh.,

•   gá le ríomhchainteoir ollchanúnach

•   gá deighleáil le códmhalartú (níos éasca é a chur isteach
    anois ná amach anseo)
Implications:          developing speech resources
Important to take the long view

Focus on infrastructure………rather than products

Because of the
      -limited funding available to lesser used languages,
      -the additional challenges and resource deficits

…it is vital that developmental work focuses
        -less on the rapid development of specific products
        -more on the provision of infrastructure and long term
         phased development
Impleachtaí:          ag forbairt acmhainní urlabhra
Tábhachtach plean fadtéarmach a chur i bhfeidhm

Fócas ar infreastruchtúr………in áit táirgí

Mar go bhfuil
      -easpa maoiniú ar fáil do theangacha
      neamhfhorleathana,
      -deacrachtaí eile & easpa acmhainní acu

…tá sé tábhachtach díriú
       -níos lú ar fhorbairt tapaidh táirgí
       -níos mó ar infreastructúr & ar fhorbairt fhadtéarmach
Broad Implications:           developing speech resources


Developing infrastructure and solutions to maximise:

   •   Adaptability to other dialects/languages and speakers

   •   Reusability of developed corpora and resources

   •   Flexibility: resources should be not too tied to specific
       technology (e.g., the dual corpus developed will
       enable diphone and unit selection synthesis)
Impleachtaí Leathana:Ag forbairt acmhainní urlabhra

Ag forbairt infreastruchtúr agus réiteach ar
fadhbanna le gur féidir:

   •   Iad a bheith úsáideach do chanúintí, do theangacha
       & do chainteoirí difriúla

   •   Corpais & acmhainní a athúsáid

   •   Solúbthacht: B‟fhearr gan a bheith ag brath ar mhodh
       theicneolaíochta ar leith (unit selection, diphone agus
       modhanna eile amach anseo)
Broad Implications:           developing speech resources


The long-term view: tension often between


   long term goals   Short term gain


Problem: funding often
   too little
   too limited time window
   one can feel pressurised to have visible results
Broad Implications:           developing speech resources


An plean fadtéarmach: teanas go minic idir


   Cuspóirí fadtéarmach   Torthaí gearrtéarmach


Fadhb: maoiniú
   ní bhíonn a dhóthain airgid
   ní bhíonn a dhóthain ama
   bíonn brú ort táirgí inláimhsithe agus torthaí so-fheictha a
   chur ar fáil go tapaidh
Implications:         developing speech resources


Infrastructure: needs to be embedded in research

   •   Speech processing (Corpora, Lexica…)

   •   Linguistic resources
       (analyses of prosody/segmentals….)

   •   Speech technology (synthesis technology...)
Impleachtaí:            ag forbairt acmhainní urlabhra


Infreastuchtúr: caithfidh sé a bheith bunaithe ar
                 thaighde eimpireach

    •   Próiseáil urlabhra (Corpais, Leicsis…)

    •   Acmhainní teangeolaíochta
        (Anailís ar phrosóid/teascáin….)

    •   Teicneolaíocht urlabhra (teicneolaíocht sintéise...)
Implications:            developing speech resources


Collaboration:      Níl neart go cur le chéile


Welsh-Irish collaborations

      key to success so far:
                   complementary skills in two teams
                   constant interaction led to rapid progress

      enables aspirations on a much larger scale
Impleachtaí:           ag forbairt acmhainní urlabhra


Comhoibriú:        Níl neart go cur le chéile


Comhoibriú Éire/An Bhreatain Bheag

      bunús an rath go dtí seo:
                   scileanna éagsúla ag an dá fhoireann
                   dul chun cinn tapaidh le idirghabháil minic

      spriocanna i bhfad níos mó dá bharr
Implications:         developing speech resources


Welsh-Irish collaboration: envisaged future work

   •   To extend the research programme to provide for
       further Irish/Welsh dialects

   •   To extend the development to other Celtic languages
Impleachtaí:          ag forbairt acmhainní urlabhra



Comhoibriú Éire/An Bhreatain Bheag: amach anseo

   •   Leathnú amach ar an chlár taighde le tuilleadh
       canúintí Gaeilge is Breatnaise a dhéanamh

   •   Leathnú amach na hoibre, le freastal ar theangacha
       Ceilteacha eile
Implications


Welsh-Irish collaboration:     Future Work



   Developing synthesis for Scottish Gaelic or Manx (close
      relatives of Irish) can be seen as an extension of the
      multi-dialect approach taken for Irish
Impleachtaí


Comhoibriú Gaeilge-Breatnais:       Amach anseo



   Is féidir ríomhchaint do Ghaeilge na hAlban agus don
       Mhanainnis (atá gaolta le Gaeilge) a fhorbairt mar
       leanúint agus mar leathnú amach ar an tsintéis
       ollchanúnach a bhéas againn don Ghaeilge
Implications


Welsh-Irish collaboration:     Future Work


   Likewise, synthesis of Breton and Cornish (close relatives
       of Welsh) can ultimately be undertaken as an
       extension of the Welsh development
Impleachtaí Leathana


Comhoibriú Gaeilge-Breatnais: Amach anseo


   Sa chaoi chéanna, is féidir ríomhchaint don Bhriotáinis
      agus Cornais (atá gaolta le Breatnais) a fhorbairt mar
      leathnú amach ar an obair atá déanta ar an
      Bhreatnais
Implications

Preservation of endangered languages/dialects


•   The synthesis and the resources that underpin it are in
    themselves key developments for language preservation


•   Apart from the direct role in education, they preserve
    natural models of the spoken language….
Impleachtaí

Caomhnú teangacha/canúintí i mbaol


•   Tá tábhacht faoi leith ag baint leis an an ríomhchaint
    (agus le na hacmhainní urlabhra eile) maidir le caomhnú
    teanga

•   Cé is maoite den ról i múineadh na teanga, caomhnaíonn
    an ríomhchainteoir glór dúchasach….

•   … a leanfaidh ar aghaidh ag caint, fiú má tá an cainteoir
    dúchais deireanneach ina thost
Sin é mo scéal….

agus má tá bréag ann,

                        bíodh !
Sin é mo scéal….

agus má tá bréag ann,

                        bíodh !
Demo
Sample 1

Sample 2

Sample 3

Sample 4

Sample 5
Samplaí
Sampla a 1

Sampla a 2

Sampla a 3

Sampla a 4

Sampla a 5

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:15
posted:2/14/2010
language:Irish
pages:104