Docstoc

FAME-slideshow_1_

Document Sample
FAME-slideshow_1_ Powered By Docstoc
					“Facilitating Agents for Multi-cultural Exchange”



              Project Presentation




                                                    Universität Karlsruhe (TH)
                                                    ILKD Prof. Waibel
         FAME Project – Barcelona, July 2004        http://www.frame-project.org
                                                    Coordinator: Florian Metze,
                                                    metze@ira.uka.de
                     Facilitating Agents for Multi-Cultural Exchange:
                              The “FAME” Multi-Modal Room

                                                                                People meet and
                                                                                discuss: scientific
Goals                                                                           topics, tourism, etc.
•   Information Butler
•   Support group members from different
    backgrounds to work on common problem
•   Facilitate use of technical equipment

On-line demonstrator functions:
•   Augmented table for multi-surface interaction           Room detects
•   Multi-lingual information retrieval               topics and presents
•   Topic spotting using far-distance speech          relevant information
•   Close-talking STT for host
•   Common dialogue manager
•   Mix of explicit and implicit interaction

Off-line functions:
•   Intelligent cameraman for lecture recording
•   Lecture transcription (speech-to-text)
                                                                             Users can review
•   Speech-to-speech translation (English,                                   information and add
    Catalan)
                                                                             new documents …

                                                                                         Universität Karlsruhe (TH)
                                                                                         ILKD Prof. Waibel
                                   FAME Project – Barcelona, July 2004                   http://www.frame-project.org
                                                                                         Coordinator: Florian Metze,
                                                                                         metze@ira.uka.de
       FAME Showcases




                                      Universität Karlsruhe (TH)
                                      ILKD Prof. Waibel
FAME Project – Barcelona, July 2004   http://www.frame-project.org
                                      Coordinator: Florian Metze,
                                      metze@ira.uka.de
                      The FAME Showcases


Scenario 1 (Presentation)                  Scenario 2 (Meeting)




• Use A/V Equipment                      • Augmented Reality
• Intelligent Cameraman                  • Video-based Activity
• Presentation Tracking                     Tracking
• Summarisation + Archiving              • Topic Spotting
• Translation, Cross-lingual IR          • Information Butler
                                                           Universität Karlsruhe (TH)
                                                           ILKD Prof. Waibel
                  FAME Project – Barcelona, July 2004      http://www.frame-project.org
                                                           Coordinator: Florian Metze,
                                                           metze@ira.uka.de
                            The FAME Demonstrator
                              (Seminar at ACL in Barcelona, July 2004)




reception by FAME-guy         meeting inside                             people mention topics




 room gives information      record testimony
 about spotted topics
                                                                                   Universität Karlsruhe (TH)
                                                                                   ILKD Prof. Waibel
                          FAME Project – Barcelona, July 2004                      http://www.frame-project.org
                                                                                   Coordinator: Florian Metze,
                                                                                   metze@ira.uka.de
                           The FAME Demonstrator
                             (Seminar at ACL in Barcelona, July 2004)




multimodal interaction      intelligent cameraman,
on multiple surfaces        presentation tracker                        the augmented table




                                                                                 Universität Karlsruhe (TH)
                                                                                 ILKD Prof. Waibel
                         FAME Project – Barcelona, July 2004                     http://www.frame-project.org
                                                                                 Coordinator: Florian Metze,
                                                                                 metze@ira.uka.de
     FAME Technologies




                                      Universität Karlsruhe (TH)
                                      ILKD Prof. Waibel
FAME Project – Barcelona, July 2004   http://www.frame-project.org
                                      Coordinator: Florian Metze,
                                      metze@ira.uka.de
                Component Integration over OAA Middleware
                                          Central components
                                          •   MYSQL Database as central data store
Communication over OAA                    •   Database agent as abstraction and connector for DB
•   OAA connects different components     •   Dialog manager as input mediator with context model
     – Various languages
     – Various modalities
     – Different platforms
                                                                  Scenario Grouping
                                                                  •       Separation into lecture recording
                                                                          and user interaction
                                                                  •       Connection through database



                                                                      Flexible Component Deployment
                                                                      •    Download components from cvs
                                                                      •    Start components on any machine
                                                                      •    Automatic connection over OAA




                                                                                          Universität Karlsruhe (TH)
                                                                                          ILKD Prof. Waibel
                            FAME Project – Barcelona, July 2004                           http://www.frame-project.org
                                                                                          Coordinator: Florian Metze,
                                                                                          metze@ira.uka.de
                       Distant Speech Recognition in FAME:
                                  Topic Spotting

Topic Spotting Environment:                            Distant English Recognizer:

                             Segmenter                 •   Implemented using ISL’s Janus toolkit and Ibis
        Audio Stream
                                                           decoder
                                                       •   Trained on BN + distant microphone “Meeting”




                                               FAME
                         Distant English
                         Recogniser                        data (16kHz, 16bit, up to 6 parallel channels)
                                                             •    Robust pre-processing for distant speech
                          Topic Detector                     •    Integration of speaker- and channel-
                                                                  adaptive training in feature space
                                                             •    Significant amount of cross-talk
                                                             •    Automatic segmentation
                                                       •   Generic (BN, SWB, Meeting) language model
•   Omnidirectional microphone in table centre             adapted to the scientific domain
•   Speaker-independent speech-to-text system,         •   Full system evaluated in NIST’s RT-04S
    incremental adaptation to room, “always on”            “Meeting” evaluation
•   Real-time performance, one-pass system                   •    Unlimited time condition
•   Tight integration of STT and IR (Topic spotting)         •    Multi-pass setup with speaker clustering
•   Topics detected by keywords:                                  and adaptation
      •    From seminar talks                                •    WER of 49.9% on “Meeting” data (best
      •    From extra documents                                   system in “SDM” condition)

                                                                                         Universität Karlsruhe (TH)
                                                                                         ILKD Prof. Waibel
                              FAME Project – Barcelona, July 2004                        http://www.frame-project.org
                                                                                         Coordinator: Florian Metze,
                                                                                         metze@ira.uka.de
                              Interacting with Augmented Surfaces

Challenges
•   The best of two worlds: every day-life objects coupled with
    digital technology to enhance human activities
•   Interact as you come: no constraints imposed on users


Results
•   New Post-WIMP toolkit:
     – rotative and zoomable User Interfaces
     – multiple simultaneous pointers
•   Robust visual tracker
     – color-based tracking of multiple tokens (more than 20)
     – 81 ms. average latency




                                                                        Universität Karlsruhe (TH)
                                                                        ILKD Prof. Waibel
                                  FAME Project – Barcelona, July 2004   http://www.frame-project.org
                                                                        Coordinator: Florian Metze,
                                                                        metze@ira.uka.de
                           Dialogue Management in the FAME system

Key Features of Dialogue                                     Dialogue Knowledge Sources
                                                             •   Dynamic data from databases
•   Dialogue management with multilingual input
                                                             •   Database information for grammar generation
•   Different input streams / modalities
                                                             •   Environment model accessible through DB
•   Multimodal interaction
                                                             •   Shared dialogue model for different languages
•   Backend application
                                                             •   NLU with context free grammars
•   Resolution with environmental context
                                                             •   English, Spanish, Catalan grammars
                                                             •   Information retrieval
                Speech
                Spanish      DB                              •   Goal based dialogue description
                                     Application
    Speech
    English
                                                          Dialogue Architecture
                                         TTS
                          DLM
                                                          Gesture
     Topic




                                                                                                            Services
                                          IR             Recognizer
    Detection                                                               NLU     Concepts   DLM
                                                          Speech
                                     Room                Recognizer
          Table
                      Multilingual   Control
       interaction
                       grammar

                                                                           Domain              Task
                                                           Grammars                            Model
                                                                            Model


                                                                                                 Universität Karlsruhe (TH)
                                                                                                 ILKD Prof. Waibel
                                     FAME Project – Barcelona, July 2004                         http://www.frame-project.org
                                                                                                 Coordinator: Florian Metze,
                                                                                                 metze@ira.uka.de
    Topic Detection & Information Retrieval in Fame

 What are they talking about?                        Topic Detection        is concerned with characterizing a topic
                                                                            for posterior detection of occurrences of
                                                                                 this topic in a stream of words




  What information is relevant for this talk?                              Information Retrieval

                                                                                      is concerned with finding,
                                                                                      within a collection of
                                                  Indexing                            multilingual, multimedia
                                                                                      documents,
                                                     looking for relevant terms       those that are relevant to the
                                                     describing each document         user's information needs
FameIR
  tool
                                                  Retrieval

                                                     looking for documents
                                                     relevant to a given query




                                                                                                 Universität Karlsruhe (TH)
                                                                                                 ILKD Prof. Waibel
                                 FAME Project – Barcelona, July 2004                             http://www.frame-project.org
                                                                                                 Coordinator: Florian Metze,
                                                                                                 metze@ira.uka.de
                         Context aware audio and video acquisition
                                       - the automatic cameraman -



                                       Formal Context Specification
                                       • situation graph composed of
                                       entities, roles, relations
     Rule based Speech
      Activity Detection
                                       Automatic program generation
Multiple modules :                                                            Mpeg file
                                       • synchronized Petri Nets
• temporal energy analysis                                                 or Mpeg stream
                                       • forward chaining rules (Jess)
• ANN for voiced segments
• sound classifier
 Can process more than
100 channels in real time
                                                 Situation
                                                recognition                Online Mpeg
                                                                            encoding
                          Speech
                          Activity
                         Detection            Visual
                                             Tracking




                                                                                         Universität Karlsruhe (TH)
                                                                                         ILKD Prof. Waibel
                                     FAME Project – Barcelona, July 2004                 http://www.frame-project.org
                                                                                         Coordinator: Florian Metze,
                                                                                         metze@ira.uka.de
                           Automatic Transcription of non-native conference speech
The TED corpus                                                 Language Model
• Conference presentations held at EuroSpeech ’93.             • Topic-adapted LMs obtained by combination of a baseline
• 31 speakers (8h) transcribed training set, 8 speakers (2h)     LM and a topic-specific LM.
  test set.                                                    • Different methods of combining LMs: Mixture, Minimum
• Challenges: spontaneous style, often not fluent, many          Discrimination Information, Probabilistic Latent Semantic
  accents, noises, lack of training data for Acoustic and        Analysis.
  Language models.                                             • Unsupervised LM adaptation: appealing, some benefit in
                                                                 perplexity but not in recognition rate. Supervision
The ITC-irst transcription system                                provided by the first recognition step.
• Beam-search Viterbi decoder with context-dependent           • Supervised LM adaptation: effective both in perplexity and
  search in a context-independent static network.                recognition, but requires some related text.
• Cross-word tied-states triphone HMMs.
                                                                              Mixture          MDI                  PLSA
• Trigram Language Model.
• Two-step processing, with Maximum Likelihood Linear                    Unsup.    Sup.   Unsup.     Sup.    Unsup.        Sup.
  Regression Acoustic Model adaptation.                         PP      13%       38%     9%       19%       4%          5%
• Coarse grain parallelism at segment level with online load    WER     0%        11%     0%       4%        0%          0%
  balancing.
                                                                 Relative improvements in perplexity (PP) and error rate
Acoustic Model                                                      (WER) with different LM adaptation methods, both
• Trained on 150h of HUB4 American English BN.                                unsupervised and supervised.
• Cluster-based mean and variance normalization, with
  completely unsupervised clustering.                          • Best result with Mixture-based supervised LM adaptation
• Off-line supervised adaptation on the 8h adaptation set.       on the paper associated to each lecture : 32.4 WER.

                                                                                                            Universität Karlsruhe (TH)
                                                                                                            ILKD Prof. Waibel
                                         FAME Project – Barcelona, July 2004                                http://www.frame-project.org
                                                                                                            Coordinator: Florian Metze,
                                                                                                            metze@ira.uka.de
                                                      Speech-to-speech Machine Translation between
                                  I’d like to                                                                                                                                                   Voldria reservar
                                book a room                  Catalan/Spanish and English (1)                                                                                                     una habitació


                                                                                   i’d like to                                      Voldria reservar
                                                                                  book a room                                        una habitació


                                                    Automatic Speech                                       Text-to-Text                                                  Text-to-Speech
                                                       Recognition                                         Translation                                                    Conversion




                                                         Rule-based Translation                                                 Statistical Translation
                                                            (UPC and CMU)                                                                (UPC)
 Automatic Speech                                                                                                                                                                                       System implementation (UPC)
                                                                                                                    • Conversion between two specific languages
Recognition (ATLAS)                                            Interlingua approach                                                                                                                       Domain: pre-arrival hotel reservation.
                                                                                                                    • A manually-translated parallel corpus is required
                                                 Conversion to/from a language-independent semantic
                                                                                                                    • After automatic word alignment of the corpus, bilingual units                       Languages: Catalan, Spanish, and
                                                 representation (Interchange Format – IF)
 • For Catalan, Spanish and                                                                                         and a statistical model is are generated                                               English
 English                                                                                                            • Translating a sentence is finding the most probable                        •         IF-based text-to-text translation system for all
 • Speaker independent                                                                                              sequence of bilingual units reproducing the sentence in the                            three languages
   • Adapted to the hotel                       English                                           Catalan           original language
                                                                                                                                                                                                 •         Statistical text-to-text translation systems: E
   reservation domain used in                                                                                                                                                                              → C,S and C,S → E
   machine translation                           I’d like to                                     Voldria reservar
                                                                                                                                                                                                 •         Automatic speech recognition system: JANUS
                                                book a room                                       una habitació
   • Vocabulary (hotel names,                                                                                                                                                                              (UKA)
   streets, people) adapted to the
   demo characteristics
                                                                          IF                                                      s: i’d like         to book       a       room                 •         Text-to-speech conversion:
                                                                                                                                                                                                               –         ATLAS-UPC system for C,S
   • Specific language models for                                                                                                                                                                              –         Festival for E.
                                                                                                                          hello / hola
   agent and client                                                                                                                           I need / necessito             ticket / bitllet                         Demo
                                                                                                                                                                a / un
                                                                                                                            please / si us                                                       •         An English-speaking client wishes to make a
                                                                                                                            plau
                                                  Other...                                Spanish                                                               a / una     room / habitació               hotel reservation with a Spanish- or Catalan-
                                                                                                                       i’d like / voldria
                                                                                                                                             to book /                                                     speaking agent.
                                                             give-information+disposition+                                                   reservar
                                                                                                                                                                                                 •         The client has certain constraints (dates, room
                                                                   reservation+room                                                      to rent / llogar       a / un        car / cotxe
                                                                                                                                                                                                           type, hotel
                                                             (disposition=(who=i, desire),                                                                                                       •         The agent has a range of available hotels but
                                                          room-spec=(identifiability=no, room))                                                                                                            needs to get the clients credit card information
                                                                                                                        TRANSLATION:         t: voldria     reservar una habitació
                                                                                                                                                                                                           to confirm the reservation




                                                                                                                                                                                                                   Universität Karlsruhe (TH)
                                                                                                                                                                                                                   ILKD Prof. Waibel
                                                                          FAME Project – Barcelona, July 2004                                                                                                      http://www.frame-project.org
                                                                                                                                                                                                                   Coordinator: Florian Metze,
                                                                                                                                                                                                                   metze@ira.uka.de
                                                                                                    Speech-to-speech Machine Translation between
                                                                                                           Catalan/Spanish and English (2)
                                        Goal                                                                                                                                                                                                                                                                  System implementation (UPC)
                 Convert a speech utterance from the source language (SL) to
                the target language (TL) in a specific domain.
                                                                                                                 IF-based Translation (UPC and                                                         Statistical Translation (UPC)                                                                   Arquitecture
                 In the actual implementation, w e use:                                                                                                                                                                                                                                                •                      Open agent arquitecture (used in the FAME EU project)
                - Domain: pre-arrival hotel reservation.                                                                     CMU)                                                                                                                                                                      •                      IF-based translation module from NESPOLE! EU project
                - Languages: Catalan, Spanish, and English. All three are used                                                                                                                Statistical approach                                                                                     Sub-systems
                either as SL or TL.                                                                         Interlingual approach                                                             •                     Automatically learnable for any pair of languages                                  •                      Automatic speech recognition system: JANUS
                                                                                                            •                  For multlilingual (Catalan, Spanish, English),                 •                     A manually-translated parallel corpus is required                                  •                      Text-to-speech conversion: ATLAS-UPC system for C,S; Festival
                                                                                                                               multidirectional (C-E, E-C, S-E, E-S, C-S, S-C) machine        •                     Can be implemented by means of a Finite State Transducer                                                  for E.
            A speech-to-speech translation system                                                                              translation                                                                          (FST)                                                                              •                      IF-based analysis+generation for all three languages
                                                                                                            •                  Only one analysis component and one generation                 •                     Robust to speech recognition errors                                                •                      Statistical text-to-text translation systems: E → C,S and C,S → E
                                                                                                                               component per language
                                                                                                            •                  Can be developed independently by groups in different
  SL
speech         Automatic
                                                                                                     TL
                                                                                                   speech
                                                                                                                               places
                                                                                                                                                                                              MT with Finite-State Transducers (FST)
                                                                                                                                                                                              •                     Estimation of a variable-length N-gram language model
                                                                                                                                                                                                                                                                                                                                                  Evaluation
                speech
                                     Text-to-text                 Text-to-speech
                                                                                                                                                                                                                    w hose edges contain bilingual units (tuples)
              recognition            translation                   conversion
                                                                                                            Interchange format (IF)                                                           •                     Translating a sentence is finding the most probable                                                An English-speaking client wishes to make a hotel
                                                                                                            •                  Represents discourse acts (request, inform, greet, etc.)                             sequence of tuples reproducing the source language                                                 reservation with a Spanish- or Catalan-speaking agent.
                                                                                                                                                                                                                    sentence (dynamic programming)
                                                                                                            •
                                                                                                                                                                                                    arg max Pr( s, t )  arg max  P( si , ti ) | ( si 1 , ti 1 ),...( si  X 1 , t j  X 1 ) 
                                                                                                                               Limited set of domain-related conceptual classes (hotel,                                               N
                                                                                                                               room, bed, person, credit card, etc.)
    Two alternative approaches used for text                                                                •                  Each discourse act and each conceptual class
                                                                                                                                                                                                            t                         t
                                                                                                                                                                                                                                           i 1                                                                        The client has certain constraints (dates, room type, hotel
    translation:                                                                                                               corresponds to one or more expressions in each                                                                                                                                          category, budget)
            Interlingua: conversion to/from                                                                                    language.                                                                                                                                                                   Agent's Utterance:
                                                           Statistic: conversion betw een                                                Analysis
            a language-independent semantic                tw o specific languages                                                                                                                                                                                                                         We have the Hotel Guitart Almirante for 150€ per night. It has four stars.
            representation
                       Interchange                                                                                                                                                                              s: i’d like           to book              a            room
                                                                                                                Map SL text to IF                          Text in                                                                                                                                         Type of room: double room
                       Format (IF)                                        Statistic                                                                        Source                                                                                                                                          Nº of people: 2 people
        Analysis                     Generation                          conversion                                                                       Languag                                                                                                                                          Type of bed: queen bed
                                                                                                                Input:
                                                                                                                I would like to reserve a hotel room
                                                                                                                                                             e               Pre-                 hello / hola                                                                                             Type of hotel: three star
                                                                                                                                                                          processing                                       I need / necessito                              ticket / bitllet                Budget: 90€-110€ per night
                                                                                                                for
                                                                                                                                                            Soup                                                                                                                                           Duration of the stay: arriving T hursday, August 5, for 4 nights,
                                                                                                                the fifth of August
                                                                                                                                                           Parser                                                                                    a / un
                                                                                                                                                                                                    please / si us                                                                                         leaving Monday, August 9
                                                                                                                Processing:                                                 Parsing                 plau                                                                                                   C lient name: David Jackson
                                                                                                                1. Apply preprocessing and                                 Grammar                                                                  a / una              room / habitació                  Mastercard Number: 3476897514708243
 Automatic Speech Recognition (ATLAS)                                                                              grammar rules to text
                                                                                                                                                           Conce                                                          to book /
                                                                                                                                                                                                                                                                                                           Expiration Date: 08-2006
                                                                                                                                                                                          i’d like / voldria
                                                                                                                2. Convert parse tree to IF                  pt                                                           reservar
                                                                                                                   representation                          Parse
  Acoustic models                                                                                                                                          Trees                                                   to rent / llogar                 a / un                   car / cotxe

                                                                                                                                                         Parser-IF
        •               Both for Catalan and Spanish                                                            Output:                                                     Mapping
                                                                                                                                                          Mapper                                                                                                                                                       The agent has a range of available hotels but needs to get
        •               Domain and speaker independent                                                          give-information+disposition+                                rules
                                                                                                                reservation+room                                                              TRANSLATION:                t: voldria              reservar una habitació                                               the clients credit card information to confirm the reservation
        •               With the Janus Speech Recognition Toolkit (UKA-CMU)
                                                                                                                (disposition=(w ho=i, desire),                                                                                                                                                             Client’s utterance:
        •               Databases used:                                                                          room-spec=(identifiability=no, room),       IF
                             –            Spanish (collected by ATLAS)                                           object-time=( md=5, month=8))                                                                                                                                                             I can spend between 120€ and 150€ per night
                                                 •            Office environment
                                                 •            16 kHz, 4 mics: headset, lavalier,
                                                              medium distance, far distance                                                                                                   Training from parallel corpora
                                                 •            Speakers: 300, balanced in age,                                                                                                                                                                                                              Opció 1


                                                 •
                                                              gender and dialect
                                                              Each speaker reads 15
                                                                                                                                        Generation                                                                                                                                                         Allotjament: Hotel Rívoli Rambla
                                                                                                                                                                                                                                                                                                           Tipus d’allotjament: 4 estrelles
                                                                                                                                                                                                                                                                                                           Adreça: La Rambla, 128
                                                              paragraphs of 60 words, reads all                                                                                                          Parallel                                                           Pre-process                    Situació: A la Rambla
                                                                                                                                                                                                                                                                                                           Preus: Habitació doble amb llit de matrimoni gran 155€ per nit
                                                              phones at least once                              Map IF to TL text                                                                                                                                                                                  Habitació doble amb llits separats 155€ per nit
                                                                                                                                                                                                         Corpus                                                                                                    Habitació individual 115€ per nit
                                                 •            Recruit 4500 paragraphs (about
                                                                                                                                                                                                                                                                                                                   No suites                                                Opció 2
                                                              25 hours of speech) phonetically                                                              IF                                                                                                                                                                                                              Allotjament: Hotel Prestige
                                                              balanced                                          Input:                                                                                                                                                                                                                                                      Tipus d’allotjament: 4 estrelles
                                                                                                                give-information+disposition+                                                                                                                                                                                                                               Adreça: Passeig de Gràcia, 62
                             –            Catalan (collected by the UPC for the                                                                                                                                                                                                                                                                                             Situació: Al Passeig de Gràcia
                                          Generalitat de Catalunya)                                             reservation+room                                                          Training data                                                                  Autom atic w ord                                                                                   Preus: Habitació doble amb llit de matrimoni gran 185€ per nit
                                                                                                                                                                                                                                                                           alignm ent                                                                                               Habitació doble amb llits separats 185€ per nit
                                                 •            Speech database for dictation                     (disposition=(w ho=i, desire),                             Mapping                    (Catalan/English/Spanis                                                                                                                                                       Habitació individual 150€ per nit
                                                              purpose (FreeSpeech training                                                                IF-Gen                                                                                                                                           Opció 3                                                                  Junior Suite 400€ per nit
                                                                                                                  room-spec=(identifiability=no,                            Rules
                                                              database) recorded at UPC
                                                              (1999): 50 men, 50 women, 25
                                                                                                                                                          Mapper                                      h)                                                            (IBM / HMM mode ls)
                                                                                                                                                                                                                                                                                                           Allotjament: Hotel Guitart Almirante
                                                                                                                                                                                                                                                                                                           Tipus d’allotjament: 4 estrelles
                                                                                                                room),                                                                                                                                                                                     Adreça: Via Laietana, 42
                                                              children.                                                                                                                   •                       ~ 35K sentences per                                                                      Situació: Al Barri Gòtic
   Language models                               •           Automatic (not human)
                                                             supervised.
                                                                                                                  object-time=( md=5, month=8))
                                                                                                                Processing:                                                                                       language
                                                                                                                                                                                                                                                                                                           Preus: Habitació doble amb llit de matrimoni gran 150€ per nit
                                                                                                                                                                                                                                                                                                                   Habitació doble amb llits separats 150€ per nit
                                                                                                                                                                                                                                                                                                                   Habitació individual 100€ per nit
                                                                                                                                                         Concep                                                                                                           Bilingual units
                                                 •           16 kHz, 2 mics: headset (100                       1.     Convert IF representation                          Generation      •                       ~ 400K w ords per language                                                                       Junior Suite 275€
                                                                                                                                                                                                                                                                                                                                                                            Opció 4
                                                             hours) and FreeSpeech mouse                                                                  t Gen                                                                                                             generation                                                                                      Allotjament: Hotel Paseo de Gracia
    •                  Training of language models   for Catalan, Spanish, and
                                                             (20 hours)                                               to generation tree                  Trees           Grammars        •                       vocabulary ~ 10-12K w ords                                                                                                                                Tipus d’allotjament: 3 estrelles
                                                                                                                                                                                                                                                                                                                                                                            Adreça: Passeig de Gràcia, 102
                       English:                                                                                 2.     Apply lexical,                                        and                                  per language                                      (pre se rving word orde r in                                                                            Situació: Al Passeig de Gràcia a prop de la Avinguda Diagonal
                                                                                                                                                                                                                                                                                                                                                                            Preus: Habitació doble amb llit de matrimoni gran 135€ per nit
                             –           Adapted to the hotel reservation domain                                     grammatical, and                                      Lexicon        •                       tourist domain (hotel                                   both languages)                                                                                            Habitació doble amb llits separats 125€ per nit
                                                                                                                                                                                                                                                                                                                                                                                     Habitació individual 110€ per nit
                                         used in machine translation                                                 morphological rules to               Genkit                                                  reservation, flight booking,                                                             Opció 5
                                                                                                                                                                                                                                                                                                           Allotjament: Hotel Duc de la Victoria
                             –           Vocabulary (hotel names, streets, people)                                    generate TL text                   Generato         Morphologic                             travel planning, touristic                                 Bilanguage
                                                                                                                                                                                                                                                                                                           Tipus d’allotjament: 3 estrelles
                                                                                                                                                                                                                                                                                                           Adreça: Duc de la Victoria, 15
                                                                                                                                                                               al                                 information dialogues,...)                                                               Situació: Al Barri Gòtic
                                         adapted to the demo characteristics                                    Output:                                     r                                                                                                                   m odel                     Preus: Habitació doble amb llit de matrimoni gran 130€ per nit
                                                                                                                                                                          Realization                                                                                                                              Habitació doble amb llits separats 120€ per nit
                                         (Barcelona)                                                            Me gustaría reservar una habitación                                                                                                                          estim ation                           Habitació individual 100€ per nit

                             –           Specific language models for agent and client                          para el cinco de agosto                                     Rules
                                                                                                                                                           Target                         Training statistics
                                                                                                                M'agradaria reservar una                                 Postprocessin                                                                                        (X-grams)
                                                                                                                                                          Languag                         •                       vocabulary of tuples: ~ 70K
    •                  Using textual corpus from the UPC:                                                       habitació                                                      g
                                                                                                                                                           e Text                                                 per pair of languages (due to
                                                                                                                pel cinc d'agost
                            –            Subset of the LC-STAR (EU project)                                                                                                                                       monotonicity constraint in
                                         spontaneous speech corpus corresponding                                                                                                                                  both languages)
                                         to the hotel reservation domain                                                                                                                  •                       transducer size: ~ 240K
                            –            Text from spontaneous speech dialogs                                                                                                                                     nodes, ~ 800K transitions
                                         intended for machine translation. Part of
                                         them translated from English C-STAR
                                         dialogs.                                                                                                                                                                                                                                                                                                            Universität Karlsruhe (TH)
                                                                                                                                                                                                                                                                                                                                                             ILKD Prof. Waibel
                                                                                                                                     FAME Project – Barcelona, July 2004                                                                                                                                                                                     http://www.frame-project.org
                                                                                                                                                                                                                                                                                                                                                             Coordinator: Florian Metze,
                                                                                                                                                                                                                                                                                                                                                             metze@ira.uka.de
                  Speech Processing for Automatic Speech Recognition
                              with Distant Microphones


                                                                                                                                    Recognized
                                 Feature Extraction                                                 Pattern Matching                  speech




      Multi-Microphone                    Voice                           Robust                            Acoustic Models
         Processing                Activity Detection                Feature Extraction                      & Adaptation
…




      Increase signal               Classify signal                        Extract                         Reduce difference
        quality using                     as                          relevant features                     between current
    multiple microphones         speech or non-speech                from speech signal                        conditions
                                                                                                         and training conditions
      Investigated techniques       Investigated techniques           Investigated techniques
      Generalized Side-Lobe      Linear Discriminant Analysis of   Different Sampling Frequencies           Investigated techniques
           Canceller                       FF features                                                      Multi-Condition Training
                                                                        Spectral Subtraction
    Wiener Filter Beam-Forming      Decision Tree Classifier                                              FF-based Jacobian Adaptation
                                                                         De-Reverberation
                                                                      Frequency Filtering (FF)
                                                                   Mean & Variance Normalization



                                                                                                                        Universität Karlsruhe (TH)
                                                                                                                        ILKD Prof. Waibel
                                      FAME Project – Barcelona, July 2004                                               http://www.frame-project.org
                                                                                                                        Coordinator: Florian Metze,
                                                                                                                        metze@ira.uka.de
FAME-related demonstrations shown at the
              FORUM2004




                                            Universität Karlsruhe (TH)
                                            ILKD Prof. Waibel
      FAME Project – Barcelona, July 2004   http://www.frame-project.org
                                            Coordinator: Florian Metze,
                                            metze@ira.uka.de
                               Language Support for Tourists

Translates speech between                        Domains
  • Chinese                                        • hotel reservation
  • Spanish                                        • basic medical needs
  • English                                        • basic tourists needs




Spoken queries to maps:
 • adress location
 • points of interest
 • route planning

                                                Goal:
              Technologies
                                           Tourist-Assistance
                • machine translation
                                                  in the                    Joint cooperation between
                • speech recognition
                                            Olympic Games,                    •   CapInfo, China
                • speech synthesis                                            •   NLPR, China
                                              Beijing 2008
                • server based                                                •   Universität Karlsruhe (TH), Germany
                                                                              •   Carnegie Mellon University, USA
                • PDA user interface                                          •   Mobile Technologies, USA
                • wireless communication

                                                                                             Universität Karlsruhe (TH)
                                                                                             ILKD Prof. Waibel
                               FAME Project – Barcelona, July 2004                           http://www.frame-project.org
                                                                                             Coordinator: Florian Metze,
                                                                                             metze@ira.uka.de
                                   Sign Translation
                                   Automatic Sign Translation        Automatic Sign Detection + OCR + Language Translation


Challenges for a Tourist                                                                    Sign Translation
                                                                                        Data-driven machine translation
  Language Barrier                                                                       Example-based machine translation
   -Spoken & Written Language                                                         (EBMT) and statistical-based machine
   -Signs                                                                             translation (SMT)
   Space or Location                                                                     Trained from a small bilingual corpus,
                                                                                      plus a bilingual lexicon
   -Navigation
                                                                                       Comparison        Study         between
   -Landmark Identification                                                          EBMT and SMT
  social and cultural boundaries                                                         EBMT has higher accuracy given
                                                                                      small corpus
Application                                Automatic Sign Detection                      SMT is better at inference unseen
                                                                                      patterns
   Assist international tourists to         Initial character candidate detection
overcome the language barriers           by edge features
   Help visually handicapped for the        Refine character candidates by
increased awareness of                   geometric constraints and color
environments                             distributions
   Help emergency personnel &
                                            Combine layout analysis with affine
others
                                         transformation parameters
                                            Multi-resolution     for     different
                                         character sizes


                                                                                                           Universität Karlsruhe (TH)
                                                                                                           ILKD Prof. Waibel
                                        FAME Project – Barcelona, July 2004                                http://www.frame-project.org
                                                                                                           Coordinator: Florian Metze,
                                                                                                           metze@ira.uka.de
                                Speech-to-Speech Translation
                                            System Features
A non-fluent speaker faces
                                    Two-way speech-to-speech
enormous problems when it
                                 translation system on a mobile platform
comes to talking about less
                                 (PDA)
familiar topics with many
specific terms, such as it is
                                    Two translation approaches: (1) a
the case in a medical
                                 language independent Interlingua
interview.
                                 representation, and (2) statistical
                                 machine translation
Our new medical translation
system enables doctors and
                                    The Interlingua based system
patients to communicate
                                 provides feedback such that the speaker
with each other. The system
                                 can check the translation
runs stand-alone on a
PDA, so doctors can easily
                                    The System handles hundreds of                System Flexibility
carry the device from patient
                                 sentences (with many variations)
to patient.
                                                                              Easy extension to new
                                    Medical domain: English-speaking       languages and domains
                                 doctor, Arabic-speaking patient              Simultaneous translation from
                                                                           one language into multiple
                                                                           languages
                                                                              Mobile and Multiple platforms
                                                                           by wireless communication
                                                                           (currently includes Spanish,
                                                                           Chinese, and Thai on the
                                                                           medical and the tourist domain)
                                                                                            Universität Karlsruhe (TH)
                                                                                            ILKD Prof. Waibel
                                     FAME Project – Barcelona, July 2004                    http://www.frame-project.org
                                                                                            Coordinator: Florian Metze,
                                                                                            metze@ira.uka.de

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:14
posted:2/12/2010
language:English
pages:21