Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Voice Enabled Web by pengxuebo

VIEWS: 0 PAGES: 53

									Voice Enabled Web
       Bryan Duggan MSc
    bryan.duggan@comp.dit.ie



                               1
Overview
   What is a speech system?
   Speech recognition
   Text-To-Speech (TTS)
   Voice Enabled Web
   Technology
   Human factors
   Business factors
   REVIEW
   RAF
   Conclusions
                               2
What is a speech system?
   “Speech as a human computer interface”
   “Computer systems with conversational interfaces”
   2 way communication
        Speech recognition
        Text-to-speech
        Voice response mechanisms
        Voice verification




                                                        3
Why Speech?

   “Speech is the ultimate, ubiquitous
    interface. It is how we should be able
    to interact with computers...Speaking
    is the most natural and universal
    method of communication between
    people. The aim of speech systems
    is to extend that communication
    modality to interaction with
    machines”
              -Markowitz, 1996
                                         4
Technical Challenge
   1st Speech system in the 1950’s (AT & T in
    1952 recognise spoken digits)
   Knowledge of word meanings required.
    Consider:
    “I want to write about this rite to Mrs.
    Wright, right away”
   Grammar’s




                                             5
More types of speech
systems...
   Multi-media indexing
     The output of speech recognition
      on media segments
     Big research area, but no
      commercial products




                                         6
Technical Challenge

   Lack of clearly marked boundaries
    What the user said: “The Pulitzer prize”
    What the computer understood: “The pullet
    surprise”

   Variability
   Algoritmms
       HMM’s
       Neural Networks
       Limit the domain

                                                7
Real world knowledge


     “The City Council would not give
      the women permission to march
      because they feared violence.”

      “The City Council would not give
      the women permission to march
      because they advocated
      violence.”


                                         8
Types of speech systems and
characteristics
     Dictation Systems
          Systems that support the use of speech
           instead of using keyboard
          Health care, law, journalism
          Accuracy of 95 percent or higher
          IBM Via Voice, Dragon Naturally Speaking
          Office XP/2003
     Command & Control
          Constrained set of words, sometimes with a
           very large vocabulary (> 5000 words)
          Call centres, Voice Portals, airline schedules
          State switching, trigger words
          Text-to-speech vs pre-recorded

                                                        9
Processing happens…
   Server side
       Telephony
       Pocket PC
       Command and control
       Indexing
   Client side
       Dictation
       Command and control
       Why?
   Conclusions
       Technical complexity of dictation systems limits
        their use
       Telephone based systems on the other hand
        have been widely deployed

                                                           10
Text To Speech

   Wolfgang Von Klepelen (1791)




                                   11
Text To Speech

   Homer Dudley's VODER (1939)




                    Click here for a sample!




                                               12
Modern TTS Systems

   Articulatory synthesis
       Model of the speech production
        mechanism in humans
   Formant synthesis
       Models of vowel and non-vowel sounds
   Concatenative synthesis
       Variable length units of human speech
   Prosody, SSML

          Click here for Formant Synthesis!
                                                13
Demonstrations

   “While he became known for
    various additional feats, his
    main concern was the study of
    human speech production, with
    therapeutic applications in
    mind.”

    Click here for Audrey!    Click here for Tim Cooper
    (AT & T Natural Voices)   (Nuance Vocalizer)

                                                          14
More information

   http://www.naturalvoices.att.com
    /demos/
   http://portal.acm.org/dl.cfm
   http://www.ling.su.se/staff/hartm
    ut/kemplne.htm




                                    15
Voice Enabled Web

   “Voice Browsers offer the
    promise of allowing everyone to
    access Web based services
    from any phone, making it
    practical to access the Web any
    time and any where, whether at
    home, on the move, or at work.”
               W3C, 2004



                                      16
Voice Enabled Web

   The “voice enabled web” is the
    web, with voice access typically
    over a telephone. It is the
    marriage of speech recognition
    technology, text to speech
    technology and web technology.
          Author, 2003



                                   17
The Voice Enabled Web


   Speech                 XML Based
  Recognition               Markup
    & TTS                  Language




                The Web



                                      18
2 Competing Standards

   W3C Speech Framework
     Pipebeach, Nuance,
      Speechworks, IBM, Lucent,
      Mororola, Tellme Networks
     http://www.w3c.org/voice

     VoiceXML Version 2.0

     SSML

     CCXML


                                  19
Ecommerce




            - Duggan, 2003

                             20
- Author, 2003
                 21
VoiceXML
   Server side technology
   Dialog
       Interaction between a user and the voice
        browser
       Input and output
       Grammar files
   Telephony
       Disconnect or transfer a call
       Call trobmoning
   Platform
       Platform specific features
   Performance
       Pre-fetch
       Cache
                                                   22
       “dead air” elimination
<vxml version="1.0">
 <form id="intro">
  <block name="block1">
    <prompt bargein="false">Hello. This is your computer.</prompt>
    <goto next="#student" />
  </block>
 </form>

 <form id="student">
     <field modal="true" bargein="false" hotword="false" name="StudentNumber" type="digits">
        <prompt bargein="true">Please say your student number to continue</prompt>
     </field>
  <block>

  <goto next="process_phone_login.jsp" method="post" namelist="StudentNumber" />
  </block>
 </form>
</vxml>

<%@page contentType="text/vxml"%>
<%@page import="ie.nci.student.*"%>

<vxml version="1.0">
 <form id="student">
<%
  String studentNumber = request.getParameter("StudentNumber");
  StudentHelper helper = new StudentHelper();
  Student student = helper.getStudentByNumber(studentNumber);
  if (student==null) {
%>
  <block>
      <prompt bargein="false">I could not find student number <%=studentNumber%> in the database.</prompt>
  </block>
<%
  }
  else {
%>
  <block>
      <prompt bargein="false">Hello <%=student.getFirstName()%> <%=student.getSurName()%>.</prompt>
      <prompt bargein="false">Your project is <%=student.getProjectTitle()%></prompt>
  </block>
<%                                                                                                           23
  }
%>
Companies

   Yahoo                Click for a demo!
   AOL
   Office Depot
   Charles Schwab
   eTrade
   1800 – Tell
   Hey Anita
   $3.5 billion by 2005 (IDC Corp,
    2001)

                                             24
The Future

   Mixed initiative dialogs
   “Say Anything”, “Speak Freely”
   Better recognition
   Improved profiling
   VoiceXML Version 3.0 will
    incorporate:
          XHTML + Voice Profiles
               http://www.w3.org/Submission/2001/13/
          SALT
               http://www.saltforum.org/

                                                        25
SALT

   Speech Application Language
    Tags
   Developed by Cisco, Intel,
    Scansoft (formally Speech
    Works), Phillips, Comverse
    and…
   …Microsoft


                                  26
SALT Aims

   Multi-modal
     PC
     PDA

     Browser

   Telephony
       Works with existing hardware
   Client & Server technology
   Leverages existing standards
                                       27
<HTML xmlns:salt="http://www.saltforum.org/2002/SALT">
<HEAD>
<!--The SALT Add-in to Internet Explorer object -->
<object id="SpeechTags" CLASSID="clsid:33cbfc53-a7de-491a-90f3-0e782a7e347a"
                 VIEWASTEXT WIDTH=0 HEIGHT=0>
</object>
</head>
<!--Importing the namespace from the implementation -->
<?import namespace="SALT" implementation="#SpeechTags" />
</HEAD>
<BODY onLoad="RunAsk()">
<script>
function RunAsk() {
       SayYourName.Start();
       Names.start();
}
function SayName() {

     MyForm.FirstName.Value = Names.Text;
     //SayYourName.content= "You said " + Names.Text;
     SayYourName.Start("You said " + Names.Text);


}
</script>
<FORM name="MyForm">
       Name:<input type ="Text" name="FirstName"><br>
</FORM>
</BODY>
</HTML>
<salt:listen id="Names" onreco="SayName()">
<SALT:grammar id="gram1" name="gram1">
                  <grammar version="1.0" xml:lang="en-US" xmlns="http://www.w3.org/2001/06/grammar" root="cities">
                                <rule id="cities">
                                                  <one-of>
                                                   <item>Dublin</item>
                                                   <item>Cork</item>
                                                   <item>London</item>
                                                   <item>Paris</item>
                                </one-of>
                                 </rule>
                  </grammar>                                                                                         28
        </SALT:grammar>
</salt:listen>
<salt:prompt id="SayYourName"> Please say the name of a city? </salt:prompt>
Competing or
Complementary ?
                                  VoiceXML                              SALT


  Scope                 Incorporates speech                Focuses on the speech
                        interface, data and control        interface only, leaving
                        flow.                              other functions to the host
                                                           environment.
  Programming           Built-in form filling algorithm.   Provides tools for
  Model                 Enables easy development           developers to write their
                        of finite state dialogues.         own form filling
                                                           algorithms.
  API Level             High level                         Low level.
  Licensing             Subject to royalty payments        SALT will be royalty free.
  Maturity              Mature standard (Version 2.0       Still in draft (July, 2002),
                        of July, 2002) with many           no commercial products
                        commercial product                 available.
                        implementations
  Target applications   Telephony, voice enabled           Multi-modal applications
                        web applications
  Developed By          VoiceXML Forum                     SALT Forum
  VoiceXML and SALT (Source: author based on Larson, 2003)


                                                                                          29
Technology Factors

   Standards
   Prototyping
   Text To Speech (TTS)
   Best of breed
   Mixed initiative
   Vocabulary
   Monitoring & enhancement

                               30
Standards
                         Multi-Modal Telephony                  Web Browser                  WAP Browser




  Client Devices
                          Devices




                                                                                                        WAP
                            SALT
  Client Technologies
                          SALT          SRGS                HTML         XHTML           JavaScript           WML


                          SSML        VoiceXML            JavaScript     DHTML             WAP           XHTML


                        JavaScript     CCXML



                          Speech & Multi-Modal               Web Technologies                Wireless (WAP)
                             Technologies                                                     Technologies
  Server Technologies




                              J2EE               .Net              XML             SSL                HTTP



                                                    Shared (Foundation) Technologies




The Voice Enabled Web Technology Map (Source: author)
                                                   31
Standards


         Encourages third
                   party                             Enables creation
            development                              of new services



                                                                        Common
                                   Why is VoiceXML                      development
      Lowers                         important?                         standard
 development
        costs

                                                                   Leverages web
                Universal vendor                                   resources
                     acceptance




Why is VoiceXML important? (Source: Strachman, 2001)



                                                                                      32
Technology Factors

   Standards
   Prototyping
   Text To Speech (TTS)
   Best of breed
   Mixed initiative
   Vocabulary
   Monitoring & enhancement

                               33
Monitoring & enhancement
       Overall
      Automation
         Rate
        Caller
      Satisfaction
         Rate           Call
                      Duration
        Call           Rate
      Completion
        Rate
        Task          On-Hold
      Completion       Time
        Rate
       Prompt         Sentence
       Usability      Error Rate
        Rate

         Play           Word
         Rate           Error
                        Rate
         Call
         Rate



       High Values   Low Values
        Desirable     Desirable




                                   34
Human Factors

   Human Computer Interface
    (HCI)
   Caller centric
   Personnel
   WOZ Prototyping
   Testing


                               35
The “Star Trek” Model


               Click here->




                              36
37
Human Factors

   Human Computer Interface
    (HCI)
   Caller centric
   Personnel
   WOZ Prototyping
   Testing


                               38
Testing

“The accuracy of speech
  recognisers is 98%, because
  speech recognisers have an
  accuracy of 98%, tests must be
  arranged to prove it.”

                   - Hyde 1979


                                   39
Business Factors

   Vcommerce
   Business benefits
   Calculating Return on Investment
    (ROI)
   Revenue models
   Risks
   Costing
   Outsourcing
   Branding
   The law                            40
                            Scenario 1: Manual Call Costing
Number of calls                                                         15,000,000
Average length (minute)                                                          1.50
Cost per call minute (human agent)                                              $1.75
Cost per call (average call length x cost per minute)                          $2.625
Total Manual Cost                                                      $39,375,000
                          Scenario 2: Automated Call Costing
Number of calls                                                         15,000,000
Average length (minute)                                                          1.50
Cost per call minute (human agent)                                              $1.75
Cost per call minute (automatic)                                                $0.20
Average cost of an automated speech call                                        $0.30
(call length x cost per minute)
% of calls handled by the voice enabled web system                               20%
Cost of manual calls                                                   $31,500,000
Cost of speech enabled calls                                                $900,000
Total manual & automated cost                                          $32,400,000
               ROI Calculation – Comparison of scenario 1 with scenario 2
Project life                                                                36 months
Benefit $                                                               $6,975,000
(difference in cost between scenario 1 and scenario 2)
Cost of automated calls                                                     $900,000
Difference                                                              $6,075,000
Pay back in %                                                                   675%
Pay back in months                                                               4.65   41
               Scenario 2 - Automated call costing incorporating the cost of failed calls

Number of calls                                                                             15,000,000

% of calls handled by the voice enabled web system                                                20%


Number of automated calls                                                                    3,000,000

% of failed calls                                                                                 15%

Number of failed calls                                                                        450,000

Cost of a failed call                                                                            $0.75

Cost of all failed calls                                                                     $337,500

Cost of automated calls                                                                      $900,000

New total automation cost                                                                   $1,237,500

Cost of manual calls                                                                   $31,500,000

New total manual & automated cost                                                      $32,737,500

                                         New ROI Calculation

Project life                                                                                36 months

New Benefit                                                                                 $6,637,500

New cost of automated calls                                                                 $1,237,500

Pay back in %                                                                                   436%

Pay back in months                                                                                6.71
                                                                                                   42
 Revenue Models
  Revenue Model                    Description                    Viability
Advertising              Revenue is generated from       Low
                         advertisers on voice portals
Fee based                Revenue is generated from       High
                         monthly or yearly fees
                         charged for a service
Premium rate             Revenue is generated from       Medium (suitable for
                         the amount charged for a user   specialist services or small
                         to make a call                  markets)
Referrals                Revenue is generated from       Low
                         companies referred by voice
                         portals
Selling over the phone   Revenue is generated from       High - (Dependant on
                         orders placed over the phone    integration with existing
                         and savings in customer         eCommerce systems and
                         service agents charges          voice verification systems)
Development &            Revenue is generated by         High
outsourcing              developing and hosting an
                         enterprise voice enabled
                         application for a third party
Content provision        Revenue is generated by         Medium - (Dependant on
                         selling on aggregated or        the continuing success of
                         specially prepared content to   the voice portal market)
                         voice portal providers

                                                                                43
Risks
        2 million non-native English speakers




        8 million with high-pitched voices that
        recognition software can't understand




        10 million people with accents, speech
        impediments, or voices that cannot be
        understoon for unknown reasons




        54 million children whose
        underdeveloped oral and nasal
        cavities produce sounds
        unrecognisable by software


        74 million people - 27% or US
        residents, will experiance major
        problems using voice recognition
        technologies

                                                  44
Outsourcing




              45
Branding
 “an element or combination of elements that uniquely
 identify a product as being produced by one particular
 supplier and thereby distinguish it from competitors’
 products. These elements usually include a particular name,
 logo, symbol and/or design that the customer then
 associates with a particular supplier “

 “the unique identity the company occupies in the mind of the
 customer “
                                               Brand
                                            Communication
                                             Mechanisms



                      Dialogue                                             Non-Dialogue



                                                       Background
Grammar               Prompts          Speaker                               Earcons      Tagline
                                                         music



          Ethnicity              Age             Sex                Tone
                                                                                                    46
 Roadmap for Enterprise
 VoIce Enabled Web
 (REVIEW)


1. Requiremnts   2. Prototype                           4. System
                                   3. Prototype Use
   & Planning    Development                          Implementation




                 7. Monitoring &   6. Deployment &
                                                        5. Testing
                  Maintenance         Integration




                     Duggan, 2003
                                                                 47
Review Achievement
Framework (RAF)
   Assess adherence
   Based on best practice
   44 Questions
   7 phases
   2 levels of granularity
   Weighted
   For use during and after the project
   Strategic managers
   Project managers
                                           48
Phase 1: Requirements and Planning
     Question                         Weight          Response    Score
     Have the initial project                    -2   -1 0 1 2
1                                       .6                         -0.6
     requirements been defined?                         
     Has the project been fully                  -2   -1 0 1 2
2                                       .8                         1.6
     costed?                                            
     Has a suitable revenue
                                                 -2 -1 0 1 2
3    model been identified for the      1                           1
                                                     
     system?
     Has consideration been
     given to the benefits and                   -2 -1 0 1 2
4                                       .9                         1.8
     risks of outsourcing the                        
     application?
     Is there a high motivation for              -2   -1 0 1 2
5                                       .5                          0
     users to call the system?                          
     Have project risks been                     -2   -1 0 1 2
6                                       .9                         -1.8
     identified?                                        
     Will the project reuse
     existing developed and                      -2 -1 0 1 2
7                                       .3                         0.6
     tested web/eCommerce                            
     infrastructure?
     Has a target platform been                  -2 -1 0 1 2
8                                       .7                         1.4
     identified for the project?                     
     Is the target platform based
     on open standards?                          -2 -1 0 1 2
9                                       .2                         0.4
     (VoiceXML, SALT and the                         
     W3C Speech Framework)
     Will the vocabulary of the                  -2 -1 0 1 2
10                                      .8                         1.6
     application be constrained?                     
     Will the content delivered by
     the application be free from                -2 -1 0 1 2
11                                      .1                         -0.1
     copyright and intellectual                      
     property concerns?
     Will the content delivered by
     the system be free from
                                                 -2 -1 0 1 2
12   material that could be             .1                          0
                                                     
     considered offensive to
     certain users?
     Weighted Total:                                               5.9    49
Project Phase Totals
      Phase                        Weight        Score
  1   Requirements and Planning             .7       5.9
  2   Prototype Development                 .2       1.7
  3   Prototype Use                         .1       3.8
  4   System Implementation                 .8       5.8
  5   Testing                               .7       5.4
  6   Deployment and Integration            .6       -3.4
  7   Maintenance and Monitoring            .3       -0.4
      Grand Total:                                   18.8




                                                            50
Case Study

   Quiz game
   Saffron Interactive
   Vox Pilot
   Outsourcing
   “Red Nose” day
   Premium rate telephone number
   Technically successful
   A commercial failure
                                51
Critical analysis &
Conclusions
   Standards
   Marketing
   Outsourcing
   ROI
   Prototyping
   HCI


                      52
Thank You for your interest!




                               53

								
To top