Captioning for Deaf and Hard of Hearing People by Editing

Document Sample
Captioning for Deaf and Hard of Hearing People by Editing Powered By Docstoc
					                                         Slide 1

    Captioning for Deaf and Hard of Hearing People by Editing Automatic
                     Speech Recognition in Real Time

                                        Mike Wald

                                         Slide 2


Lecture Access:
Traditional University responses

•   Notetaking by Intermediaries
     – Paid or Volunteer
•   Real-Time Transcription
•   Sign language interpreting
•   Assistive technology
     – Radio Microphones
     – Digital recorders
                                         Slide 3

ASR for Dictation: Basic principles
•   Designed to create written language
•   Statistical basis for recognition
•   SR learns
     – how you speak
     – how you write
     – new words

                                         Slide 4

Automatic Speech Recognition
Speech >Acoustic Model>Language Model>Text

                                         Slide 5


SR Dictation is Difficult
• People speak differently
•   Different words can sound the same
•   Non verbal cues can’t be used
•   Andtherearenospacesbetweenwordswhenpeopletalksoitisunclearwherewordsb
    eginandend

                                         Slide 6

SR in classrooms is VERY DIFFICULT!
• Star Trek expectations
• Special vocabulary
• Spontaneous speech vs. writing
• Dialogue and interaction
                                         Slide 7


This is a demonstration of the problem of the readability of text created by
commercial speech recognition software used in lectures they were designed for
the speaker to dictate grammatically complete sentences using punctuation by
saying comma period new paragraph to provide phrase sentence and paragraph
markers when people speak spontaneously they do not speak in what would be
regarded as grammatically correct sentences as you can see you just see a
continuous stream of text with no obvious beginnings and ends of sentences
normal written text would break up this text by the use of punctuation such as
commas and periods or new lines by getting the software to insert breaks in the
text automatically by measuring the length of the silence between words we can
improve the readability greatly

                                         Slide 8


This is a demonstration of the problem of the readability of text created by commercial
speech recognition software used in lectures

they were designed for the speaker to dictate grammatically complete sentences using
punctuation by saying comma period new paragraph to provide phrase sentence and
paragraph markers

when people speak spontaneously they do not speak in what would be regarded as
grammatically correct sentences

as you can see you just see a continuous stream of text with no obvious beginnings and
ends of sentences
normal written text would break up this text by the use of punctuation such as commas
and period or new lines

by getting the software to insert breaks in the text automatically by measuring the length
of the silence between words we can improve the readability greatly

                                         Slide 9

1998 LIBERATED LEARNING
Pilot Project
(Photo of Lecturer wearing a radio microphone and students in class looking at the text
displayed on a screen)

                                         Slide 10

Speech Recognition: real time access to spoken language
(photo of Lecturer wearing a radio microphone and a student pointing at words displayed
on the screen)

                                         Slide 11


using ASR in class…
(Photo of Lecturer wearing a radio microphone and students in class looking at the text
displayed on a screen

                                         Slide 12


Initial Research Summary
• Helps students gain better access to lecture material

• Faculty felt it improved teaching

• Challenges:
            • Accuracy
            • Readability
            • Ease of use
                                         Slide 13

“SuperHuman” Speech Recognition

Goal: Surpass human ability to accurately transcribe speech

(graph showing improvement in accuracy of speech recognition)
                                          Slide 14


   •   Speaker Independent SR
   •   Portable / wearable devices

(photos of deaf person using portable computer and head-mounted display)

                                      Slide 15
(Diagram showing show text can be displayed on personal computer displays)


28 KB/s speech signal>
>ViaScribe
>320 b/s (240 words/minute)
> Personal Display Server
> Personal Display Client

                                          Slide 16

Personal Display Client Options/Font
The user can choose the pause delays….. and separators-----
And font and colour preferences

                                    Slide 17
Personal Display Client
(screenshot showing the window display of text)


                                          Slide 18

Multiple Speaker Displays
Window showing speaker 1
I think it would be a good idea to go ahead with the plan
Window showing speaker 2
I disagree as in this present financial situation it would be better to be careful


                                          Slide 19

Accuracy & Understanding
Stop >70%/ Proceed with Caution >80%/ GO >85%

                                          Slide 20
                                 Word Error Rate
                           underestimating understanding

said: The Liberated Learning Consortium has been in existence for nearly seven years.
displayed: A Liberated Learning Consortium has existed for seven years.
>40% word error rate


                                       Slide 21

Word Error Rate
overestimating understanding
said: They did win the battle
displayed: They did not win the battle
20% word error rate



                                       Slide 22

                                    Key Errors
              Not easily guessed without prior knowledge of content
                    greatly improve understanding if corrected
said: They did win the battle
displayed: They did not win the battle
Key error: not

                                       Slide 23

                                        Study


                     speaker independent speech recognition

                                 22% word error rate

                                2715 word transcript

                                       Slide 24

                                   Key Errors
                               16% of the total errors

                     would require 5 corrections per minute to
                              understand everything
                                        but
                only improves Word Error Rate from 22% to 18%

                                    Slide 25
(Diagram showing how real time editor interfaces between lecturer and personal
display)

Real Time Editor
28 KB/s speech signal
320 b/s (240 words/minute)
> Corrects 15 errors/minute
> 320 b/s (240 words/minute)


                                     Slide 26




                              Real Time Editor
(recorded video/audio demonstration)

                                     Slide 27

Personal Display Real Time Editing
Original Text
This is lemon station of how the original ViaScribe taxi can be displayed
immediately in won window while the slightly delayed corrected text can me
displayed in separate window
Corrected Text
This is a demonstration of how the original ViaScribe text can be displayed
immediately in one window while the slightly delayed corrected text can be
displayed in a separate window

                                     Slide 28

                                   to come ….


                                     Slide 29

Personal Display Highlighting
Personalised Display Window
This is a demonstration of the ViaScribe text sent to the personalised display
window with certain important sections highlighted by the user

                                     Slide 30
Personal Display Notes
Personalised Display Window
This is a demonstration of the ViaScribe text sent to the personalised display window
Notes Window
These are the user’s notes time synchronised with the display text

                                          Slide 31

Multiple Speaker Single Window Display
Window showing speaker 1 & speaker 2
I think it would be a good idea to go ahead with the plan

I disagree as in this present financial situation it would be better to be careful

                                          Slide 32


Multiple Speaker Editing
Edited display of speaker 1 & speaker 2
Real Time Editor
Corrects 15 errors/minute

I think it would be a good idea to go ahead with the plan
I disagree as in this present financial situation it would be better to be careful



                                          Slide 33

Questions ?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:9
posted:10/2/2012
language:English
pages:7