SSN 195 by iwestaaiegjpuiv


									ISSN 1932-8214                                                              September 2009

                          Speech analytics gets increasing attention
        Notes on conversations with Autonomy, ClickFox, Nexidia, Nice Systems, and Verint
   It’s become a cliché that companies should listen    behind that sentiment. That could be one reason
to their customers, and no company would debate the     behind the heightened announcement and sales
importance of understanding what their customers        activity cited by most vendors of call center speech
want and how well their company is meeting those        analytics solutions. Speech analytics can also
needs. Customer service operations represent a key      directly reduce costs by identifying the issues that
opportunity for understanding interactions with         are taking agents’ time, so that the root cause can be
customers. But perhaps it takes a difficult economic    corrected.
climate to push companies to put more resources                                         Continued on page 36

                  Avaya to offer option of Loquendo speech technology
         Agreement expands Loquendo US penetration with its speech recognition and TTS
   Avaya Inc., a large provider of business             speech recognition and text-to-speech licenses as a
communications applications, systems, and services      speech automation option for the Avaya Voice Portal
that recently purchased Nortel’s enterprise business    worldwide. The connection should in particular
(SSN, August 2009, p. 1 and 28), announced an           expand the use of Loquendo speech technology in
expansion of its relationship with Loquendo (see        the US. Avaya also offers Nuance speech
articles, p. 6, 7, and 41). The expanded strategic      technology.
relationship enables Avaya to resell Loquendo                                          Continued on page 38

          Survey confirms “say what you want” model for smartphone users
                           Study sponsored by Tellme (Microsoft subsidiary)
  A recent study conducted by Sanderson Studios         button to say what they want and get it. (Tellme,
for Tellme Networks, a Microsoft subsidiary,            according to a panelist at SpeechTEK, processes 10
confirms that smartphone users are more inclined to     billion transactions per year.)
buy a device that offers them the ability to push one                                   Continued on page 39

       Volt Delta speech recognition innovation supports difficult applications
                    Hosted contact center solutions can support large call volumes
   Volt Delta Resources LLC, a wholly owned             from continuing operations for the second quarter
subsidiary of Volt Information Sciences, Inc., is a     ended May 3, 2009.) VoltDelta OnDemand provides
provider of contact center technology and self-         contact center solutions including speech recognition
service solutions for large scale enterprises and       applications.
telecommunication providers around the globe. (Volt                                     Continued on page 39
Information Sciences had sales of $447.1 million

  Interviews with Darrell Knight, Message Technologies, p. 31; Joe Alwan, BBN Technologies, p. 33
                           VUI Visions from Matt Yuschik, Convergys, p. 34
Speech Strategy News                                          September 2009                                         2

                                                 Table of Contents
Speech analytics gets increasing attention 1                   TeleNav GSP Navigator to be available on
 Notes on conversations with Autonomy, ClickFox,                  T-Mobile myTouch 3G and Sidekick LX 11 
 Nexidia, Nice Systems, and Verint                       1      Spoken directions and speech recognition for
Avaya to offer option of Loquendo speech                        address entry and business search     11 
  technology                             1                     Ditech voice access chosen by
 Agreement expands Loquendo US penetration with          for Developer Program 11 
 its speech recognition and TTS                          1      Enterprises can activate voice applications during
Survey confirms “say what you want”                             any call                                 11 
  model for smartphone users                             1 partners to extend capabilities
 Study sponsored by Tellme (Microsoft subsidiary)        1       of its hosted call center services      12 
Volt Delta speech recognition innovation                        Novauris provides option for Name and Address
  supports difficult applications                        1      Capture by speech                                    12 
 Hosted contact center solutions can support large             Syntellect releases VoiceXML Studio 7.2
 call volumes                                            1       and offers free trial of testing solution 12 
Editor’s Notes                                           5      Also in the process of updating its full development
Using – but not abusing - speech                                environment                                          12 
  recognition technology                                 5     Pronexus enhances its “CT ADE Migration
 Bill Meisel, Publisher & Editor                         5       Program”                              13 
Microsoft’s 800-Bing-411: More than                             VBVoice IVR development tool positioned as
  directory assistance                                   6      replacement for Syntellect offering      13 
 Turn-by-turn driving directions, traffic conditions,          Empirix tests contact center solutions by
 weather, and movie showtimes                            6       simulating callers                      14 
Loquendo releases new version of its                            Stress testing at full load              14 
  VoiceXML/CCXML Platform                                6     Nexidia enhances its speech analytics to
 Also adds new TTS language and features                 6       detect unexpected events                14 
Biomni uses Loquendo TTS in hospital                            Supplement to core product doesn’t require
   appointment reminder system                           7      advanced set-up or analysis            14 
 Text-to-speech reminds patients of upcoming                   CallMiner speech analytics incorporated
 appointments by phone                                   7       into solutions by resellers           15 
Verizon Business supports automated                             Aspect Software helps healthcare organization with
  speech services in an IP IVR                                  analytics solution                                 15 
  environment                                            8     Nice Systems combines phonetic and
 Expands support for speech technology beyond                     large-vocabulary speech analytics                  16 
 classical IVR platform                                  8      A recent win includes a $55 million award from a
Convergys combines business intelligence                        government agency                                    16 
  with self-service and notification     8                     CallCopy’s new speech analytics solution
 Integrates former Intervoice products to ease                   claimed to offer low cost of ownership 16 
 personalization                                         8      Overly complex solutions may have slowed adoption 16 
Companies use Ifbyphone’s Voice                                OnviSource integrates speech analytics
  Broadcasting service to contact                                with its workforce optimization suite 17 
  customers                                             10      Free pilot program allows companies to “try before
                                                                they buy”                                          17 
 Lawn care company uses the outbound service for
 accounts receivable                             10            Utopy announces new version of its
Survey shows $5.6 billion loss from                              speech analytics product                            17 
   inability to meet customer expectations10                    Combined with a new version of its contact center
                                                                performance optimization solution                 17 
 Indians prefer automated self service to agents 10 
Voice control from ATX for BMW available                       DSS adds speech analytics from Aurix to
   in September with new models                  10              its multi-media recording solutions      18 
                                                                Designed for the 911 public safety market 18 
 Control the telephone, climate control, navigation,
 and sound systems                                      10     Voxeo releases major upgrade of its
                                                                 “Unlocked Communications” platform 18 
                                                                Voxeo open-source initiative aims to increase
                                                                options for customers                                18 
Speech Strategy News                                      September 2009                                                                3
Nuance Communications speech                               Vianix highlights opportunities in field
  attendant rated “Avaya Compliant”                 20        service and mobile workforce
 Nuance speech-enabled call router passes Avaya               automation                                                                  30 
 tests                                              20       Speech processing optimized for high speech
New version of Nuance TTS for contact                        recognition accuracy                   30 
  center and mobile                                 20     3Play Media uses speech recognition with
 Vocalizer 5.0 blends synthetic and recorded speech          quality assurance                      30 
 automatically                                      20       The FeedRoom and 3Play Media offer online video
French transport firm uses Nuance speech                     captioning services                             30 
   technology in customer service       21                 Interview with Darrell Knight, Message
 12% of proof of delivery requests are fully                  Technologies                                                                31 
 automated                                          21       Dedication to hosted IVR solutions                                           31 
Apple’s new Snow Leopard OS has                            Interview with Joe Alwan, BBN
  enhanced accessibility features                   21        Technologies                                                                33 
 VoiceOver speaks what one is pointing to on the             AVOKE Caller Experience Analytics is an analytics
 screen                                   21                 solution for call centers                                                    33 
PerSay releases product for testing voice                  VUI Visions                                                                    34 
  biometrics solutions                    21               Multimodal Customer Service
 Also releases product for real-time detection of            Transactions                                                                 34 
 fraudsters calling in                              21       Matt Yuschik, Convergys Corporation                                          34 
STC highlights media monitoring service
  and voice verification technology     22                 News Briefs................................................... 40 
 Voice-based access control for mobile devices one
 option                                            22      Yucheng Partners with Convergys to expand the
Auraya Systems develops new voice                            contact center product & services market in China . 40 
  biometrics solution                               23     Gold Systems’ new conferencing system uses
 Allows thresholds to be set based on the business           Microsoft OCS 2007 R2.............................................. 40 
 risk of a specific transaction                    23      Eckoh wins a £1.5 million contract to provide speech
                                                             recognition services for a major UK transport
Nu Echo offers IVR application testing tool24                organization ................................................................. 41 
 Voxeo will offer Nu Echo’s grammar development            Micromation integrates call center solution for Health
 tool integrated with VoiceObjects       24 
                                                             Dialog using Loquendo speech technology............... 41 
Openstream offers multimodal browser for                   VoiceVerified is now CSIdentity, offers multi-layer
  mobile phones                         26                   security model with speech authentication .............. 41 
 Used by Omnesys and Openstream in brokerage and           IntelePeer and Transera team to offer on-demand
 trading application                            26           contact center solutions ............................................. 41 
Aisle411 service helps find items in a                     Home emergency insurance and repair service
   retail store                                     26       deploys contact center systems using Sabio ............ 41 
 Customer calls a toll-free number on any mobile           Spanlink to resell Interactive Intelligence unified IP
 phone                                              26       business communications solutions ......................... 41 
Automated language translation a                           Voice Web Solutions’ Grammar Studio tool for
                                                             visually creating and deploying speech grammars... 42 
  growing area                                      27     Genesys IVR platform available from LBi Software
 SpeechGear, AppTek, BBN, Language Weaver, and               Engineering.................................................................. 42 
 Next IT                                       27          Customers are changing with new social norms,
Humanity Interactive offers animated                         notes Paul Greenberg in SpeechTEK keynote .......... 42 
  online characters                                 29     Dimension Data speech self-service survey suggests
 Red Shift provides supporting speech recognition            that customers resent speech automation in part
 technology                                         29       because, when it fails, they have to start over with
VoiceXML Forum celebrates 10-Year                            an agent....................................................................... 42 
  anniversary                                       29     ClickFox and Business Systems partner to bring
 AVIOS and Forum cooperate to advance best                   customer experience analytics to EMEA companies 42 
 practices in using speech technology               29     Servion offers Customer Interaction Systems
                                                             supporting multiple channels, including voice.......... 42 
                                                           Google Voice indexes using text tags and plays
                                                             podcasts and web audio............................................. 43 
                                                           US consumers continue dropping landlines for
                                                             mobile phones............................................................. 43 
Speech Strategy News                                                                     September 2009                                                                   4
CNET’s choices for top BlackBerry apps favor speech                                       Financial Notes.............................................48 
SpinVox makes its API available in Italian and                                            Nuance announces third quarter 2009 results—
  Portuguese...................................................................43           11.2% increase in GAAP revenue to $241 million,
Rogers Wireless expands use of SpinVox voicemail-                                           non-GAAP net income of $73.3 million, and cash
  to-text service ..............................................................43          balance of $418.6 million .......................................... 48 
GM Voices develops international voice personalities                                      Ditech Networks reports 34% revenue growth in
  for Hewlett-Packard global operations ......................44                            fiscal quarter ............................................................... 49 
Apple bans Google Voice from iPhone..........................44                           Convergys reports second quarter results, customer
Navigon navigation software on Apple iPhone will add                                        management operating income up 90% ................... 49 
  announcement of street names using SVOX text-to-                                        NICE Systems reports second quarter 2009 non-
  speech..........................................................................44        GAAP decline in revenue compared to Q2 2008 ...... 49 
Microsoft-Nokia alliance may compete with RIM—by                                          Interactive Intelligence reports second quarter
  putting Office on your mobile phone!.........................44                           revenues increased to $32.9 million......................... 50 
Nuance’s predictive text and embedded mobile                                              In-Q-Tel investment in Carnegie Speech to enhance
  speech software are named to VisionMobile’s 100                                           spoken-language training software for the US
  Million Club ..................................................................44         intelligence community............................................... 50 
Esnatech & eOn Communications launch Mobile                                               Call Genie reports second quarter 2009 financial
  United Communications client software on                                                  results .......................................................................... 50 
  BlackBerry App World..................................................45                Wizzard Software announces 2009 Q2 21% decline
Ultratec’s CapTel service allows phone conversations                                        in revenues and a loss................................................ 50 
  for those with hearing loss .........................................45                 Fonix outlines acquisition strategy ............................... 51 
Verizon Wireless initiatives encourage developers,
  contest and app store planned ..................................45                      People ........................................................... 51 
PropertyMinder adds text-to-speech to real estate
  websites .......................................................................45      Genesys Telecommunications Laboratories names
Australian welfare agency adds text-to-speech to web                                       Jason Stirling as the new senior vice president of
  site to aid visually impaired, developed by                                              Genesys Asia Pacific ................................................... 51 
  VoiceCorp International ..............................................45                Richard R. Devenuti elected to Convergys Board of
DynaVox Mayer-Johnson announces hand-held                                                  Directors ...................................................................... 51 
  speech solutions for augmentative and alternative                                       Thomas Connelly appointed President/COO of
  communication (AAC) ..................................................46                 OrderCatcher, Inc. ....................................................... 51 
English-Italian dictionary CD-ROM to use Loquendo                                         Avaya’s Carol Giles Neslund recognized by Everything
  text-to-speech ..............................................................46          Channel’s CRN Magazine as One of the Top 100
Students Improve writing, reading skills with                                              Women in the Channel ............................................... 51 
  Pearson's WriteToLearn supported by
  SpeechStream from Texthelp .....................................46                      For Further Information on Products
LXE introduces MX9 ultra-rugged data collection                                           Mentioned in this Issue ............................... 52 
  computers for harsh environments............................46 
SanDisk Cruzer Enterprise secure USB flash drives                                         Voice Search 2010                                                                56 
  support text-to-speech screen readers for                                                 Third edition of breakthrough conference                                       56 
US grants $1.2 billion for electronic health records....47 
Wellmont Health System awards contract to
                                                                                                        Common Abbreviations
  MedQuist for enterprise-wide transcription and
                                                                                          CCXML (Call Control extensible Markup Language)
  radiology speech recognition......................................47 
                                                                                          CPE (Customer Premises Equipment)
Updated candidate recommendation of Speech
                                                                                          CRM (Customer Relationship Management)
  Synthesis Markup Language (SSML) .........................47 
                                                                                          GAAP (Generally Accepted Accounting Principles)
International Telecommunications Union promotes
                                                                                          EMEA (Europe, Middle East, and Africa)
  implementation of United Nations Convention on
                                                                                          IP (Internet Protocol)
  the Rights of Persons with Disabilities in
                                                                                          IT (information Technology)
  information and communications systems ...............47 
                                                                                          IVR (Interactive Voice Response)
Fonix iSpeak voice dial by name or number app
                                                                                          SaaS (Software as a Service)
  available for iPhone 3.0, and lip-sych software
                                                                                          SDK (Software Development Kit)
  licensed for Microsoft’s South Park Xbox video
                                                                                          SIP (Session Initiation Protocol)
  game ............................................................................47 
                                                                                          SOA (Service Oriented Architecture)
New draft of Media Resource Control Protocol
                                                                                          TTS (Text-to-Speech)
  Version 2 standard available......................................48 
                                                                                          VoIP (Voice over Internet Protocol)
A sobering thought .........................................................48 
                                                                                          VoiceXML (Voice eXtensible Markup Language)
Speech Strategy News                                      September 2009                                      5

Editor’s Notes
                 Using – but not abusing - speech recognition technology
                                     Bill Meisel, Publisher & Editor

   Speech recognition has improved each year, and today it can handle difficult tasks such as business
directory assistance as accurately as agents when a request is an actual listing, according to some studies.
Speech technology isn’t artificial intelligence; it requires a constrained context to be accurate, and, within a
sufficiently constrained context, can often be very accurate.
   A good core technology, however, isn’t enough. An application can make the technology look bad by poor
design (e.g., by not fully understanding how a user may react to a prompt), or by creating overly high
expectations in the more difficult applications (such as voicemail-to-text or speaking a search term for a Web
search engine).
   We are all aware that even simple applications that don’t tax speech recognition technology can be badly
designed, giving the impression that speech recognition is failing. The most common problem in structured
contact center applications, for example, is not including in grammars alternate ways of saying an objective
that are perfectly reasonable alternatives (e.g., “correct” instead of “yes”) or attempts to avoid menu layers
(sometimes because the caller has used the system before and knows the options presented in the upcoming
prompt). Many errors in multi-layer dialogs could be avoided simply, for example, if the second-level
grammars were included in the first-level grammars and shortened the navigation when spoken.
Unfortunately, even this simple artifice—easily supported by today’s speech recognition technology and
application development tools—is almost never used. One doesn’t have to think of every option during initial
design, but speech analytics or a simple review of failed transactions can make improvements obvious.
   Beyond bad design, improvements in speech technology can be overcome by increasingly ambitious
applications. Unconstrained text-to-speech is one of those applications, with an application such as voicemail-
to-text being one of the most ambitious. Dictation over a telephone channel of extemporaneous speech is not
equivalent to dictation on a PC, which can be quite accurate. PC speech recognition dictation software uses a
good directional microphone (usually in a relatively quiet office setting), trains to an individual’s voice and
vocabulary, and shows the result on the screen immediately (providing feedback to the user that often
improves dictation skills).
   Leaving a voicemail message doesn’t have the cited advantages of PC dictation. The difficulty of the task
suggests why some services back up the speech recognition with agents, although this is only financially
viable in the short-term.
   Unconstrained voice search (entering text into a text-based search engine by voice) is also highly
ambitious. There is little context if a person just says a few unconnected key words or phrases, e.g., “smog
check Tarzana California.” However, search entries have a certain consistency in form, and vendors serving
this market may be able to exploit that, as well as using user-specific tuning to exploit repeated types of
searches and locations by a specific user. Nevertheless, this application is certainly one of the more
challenging speech recognition tasks. One way to make it more effective may be to use dialog to clarify
ambiguous requests.
   The very ambitious applications can lead to skepticism about the accuracy of speech recognition in general
if the vendors don’t create realistic expectations. For voicemail-to-text, word-for-word accuracy may not be
critically important if a user can quickly get the gist of a voicemail and have enough information to decide its
priority. Similarly, a search query can sometimes deliver useful results even if some terms are misrecognized.
   Perception is important to acceptance by users. Speech technology seems to face a higher hurdle than some
other user interface innovations, such as gestures on touch screens. The hurdle seems to be in part because of
comparison to human speech recognition. Negative attitudes may also be the result in part of a visceral
reaction to the computer trying to mimic a human. There have been extraordinary efforts to short-circuit
automated customer service, for example, including web sites that offer methods to get through to an agent
without giving automation a chance.
   Without care, there could be a reaction to speech recognition where users and investors pull back with the
oft-heard lament that the technology isn’t ready yet. Applications that work well and provide services that
users come to love can help avoid that outcome.
Speech Strategy News                                      September 2009                                       6

                Microsoft’s 800-Bing-411: More than directory assistance
          Turn-by-turn driving directions, traffic conditions, weather, and movie showtimes

   Free directory assistance numbers automated by         service. Microsoft had offered a directory assistance
speech recognition have proved popular with               service through its Tellme subsidiary (p. 1), but
consumers. In a survey of over 800 mobile phone           recently rebranded it—along with naming its text-
users conducted by The PELORUS Group, findings            based search engine Bing—as 800-Bing-411.
indicate that only 42% of directory listing searches      Neither Google nor Microsoft have promoted the
by heavy users of directory assistance services (those    directory assistance services extensively, perhaps
calling a directory listing service more than five        being cautious about supporting large volumes of
times in a three-month period) are made to traditional    calls or waiting until they have good models for
411 services offered by their mobile phone carriers.      making money from the free services.
This compares to 64% by light users (mobile users            One of the least publicized aspects of Bing-411 is
calling a directory listing service less than two times   that it is more than directory assistance, and suggests
in a three-month period). The survey also revealed        the evolution of the directory assistance lines into
that heavy directory service users funnel 33% of their    general “voice sites” over time. Bing-411 offers
directory searches to free advertisement-based            aspects of the original 800-555-TELL information
services such as Jingle Networks’ FREE-411, and           line from Tellme. It provides:
13% to GOOG-411, Google’s free service. Jingle             Audio turn-by-turn driving directions (or
Networks provides both business and residential                alternatively a text message with the directions);
listings, and Google provides business listings.           Traffic reports, personalized to your route;
   Based on largely anecdotal evidence, it appears         Movie showtimes and theater info; and
that Google’s 800-GOOG-411 is better known to              Current weather and forecasts.
consumers than Microsoft’s similar 800-Bing-411

           Loquendo releases new version of its VoiceXML/CCXML Platform
                                Also adds new TTS language and features

   On August 18, Loquendo released version 7.0.26             tracing, caller and called ID information via
of Loquendo VoxNauta, its platform for developing             ECMA.
Interactive Voice Response (IVR) and speech-                 Improved support of the IETF RFC 4240
enabled applications that use the VoiceXML 2.0 and            (Netann protocol): Supports the implementation
2.1 and CCXML 1.0 standards. VoxNauta is                      of network announcements and other basic
available standalone or with Loquendo speech                  functions. This allows VoxNauta to be used as a
recognition and/or text-to-speech (TTS) software              Media Resource Function (MRF) within IMS
integrated. The speech engines use a modular                  architecture.
approach that eases any future upgrades. Loquendo
                                                             The Loquendo VoxNauta platform is currently
also added Australian English and other features and
                                                          available in both the 7.0 and 7.1 series. The 7.0
voices to its TTS offerings; Loquendo TTS is
                                                          version ensures backwards-compatibility with the
available for telephony, multimedia, and embedded
                                                          installed base. Version 7.1 supports new features of
                                                          Loquendo speech technologies and provides
Loquendo VoxNauta                                         compatibility with the latest standards releases.
                                                             Loquendo VoxNauta is available for both
 The new release features:                                Windows and Linux. It supports traditional
 Increased Density: Up to 240 simultaneous               telephony or VoIP. The platform also supports major
  channels per server in IVR and IVR-plus-TTS             standards in speech: VoiceXML 2.0 & 2.1, CCXML
  configurations, and up to 120 simultaneous              1.0, MRCP v2, SRGS 1.0, SISR 1.0, SSML 1.0 (and,
  channels per server in full-dialogue (speech            in the near future, PLS 1.0); the platform has been
  recognition plus TTS) configuration.                    formally certified by the VoiceXML Forum.
 Enhanced VoiceXML service log: Error tracing,
  service closure tracing, speaker verification
Speech Strategy News                                       September 2009                                     7
TTS enhancements                                           rapid way of resolving pronunciation issues within,
                                                           speech recognition grammars or lengthy TTS
   Loquendo launched Loquendo TTS in Australian
                                                           prompts that might, for example, have multiple
English, with the voice of Alan, and released the first
                                                           pronunciations, such as “bass” or “record.”
male Russian voice, Dmitri. After the release of
                                                              These new features are available to developers
Loquendo speech recognition in Russian earlier this
                                                           through a new release of Loquendo TTS Director, a
year, Dmitri and Olga (the female Russian voice)
                                                           prompt-designing tool. This includes an enhanced
now make the full array of Loquendo speech
                                                           Lexicon Manager, which now allows prompt
technologies available to developers in the Russian-
                                                           designers     to    automatically    generate   IPA
speaking world.
                                                           transcriptions, just as they can already do in the
   There is also a new version of the TTS engine.
                                                           SAMPA phonetic alphabet. IPA transcriptions are
Loquendo TTS 7.8 now offers support for the
                                                           easy to modify, and the pronunciation can be rapidly
International Phonetic Alphabet (IPA). Users can
                                                           verified by simply clicking ‘Play.’
insert transcriptions written in IPA directly into their
                                                              Loquendo also continues to expand the range of
prompts, and Loquendo TTS will read them
                                                           sound effects accessible via Loquendo TTS Director
correctly. Loquendo TTS now also supports the
                                                           – which now includes a choice of nine robotic
Pronunciation Lexicon Specification (PLS) 1.0, a
                                                           effects and a brand new ‘whispering’ effect. These
W3C Recommendation standard which allows
                                                           controls introduce elements of tone and texture to
prompt designers to specify pronunciation
                                                           Loquendo’s synthetic voices.
information within a VoiceXML document. This is a

        Biomni uses Loquendo TTS in hospital appointment reminder system
                  Text-to-speech reminds patients of upcoming appointments by phone

   Biomni Voice (using the tradename VoicExcel                Two-thirds of patients simply forget their
after acquiring that company in 2006) provides             appointments, and a quarter feel better and so choose
hosted (web-based) automated voice messaging               not to attend. However, since the application was
services, using speech technologies to automate            deployed, attendance rates have risen significantly
repetitive outbound telephone tasks. The company’s         and waiting lists reduced. Katharine Horner,
clients include businesses within the healthcare,          Outpatients and Booking Manager for the NHS, said,
finance, leisure, and corporate sectors. The company       “On average, one in seven outpatients fail to turn up
uses Loquendo speech technologies.                         to appointments across Eastbourne, Hastings and
   A new automated voice messaging service                 Bexhill Hospitals. This means that time is being
developed by Biomni for the UK's National Health           wasted for other patients who could be seen at those
Service (NHS) was recently launched. The service           times. It also adds up to a loss of around £2 million
now reminds patients of their hospital appointments        each year for the Trust.”
by phone for East Sussex Hospitals NHS Trust, a               Angus Gregory, CEO, at Biomni Voice said that
public sector British healthcare provider. The system      the service is far more effective and costs less to
contacts patients seven days before a new outpatient       operate than traditional written reminders and
appointment. The recipient of a call is asked to           provides immediate and actionable feedback on
confirm their identity. An automated call using a          patients’ intentions, something that written
voice from Loquendo’s text-to-speech engine                reminders cannot do. Rosanna Duce, VP
reminds patients of the date and time of their             international sales at Loquendo, noted that today’s
appointment, inviting them to confirm attendance. If       complex economy benefits from cost-effective
a patient no longer needs, or can't make, an               solutions such as Biomni’s interactive voice
appointment, the system alerts the appointments team       messaging services.
at the Trust, thus allowing the slot to be offered to
other patients.
Speech Strategy News                                     September 2009                                      8

 Verizon Business supports automated speech services in an IP IVR environment
                Expands support for speech technology beyond classical IVR platform

   Verizon Business has long provided fully              based contact centers. Verizon Business’ support for
managed, hosted services and professional services       hybrid solutions allows customers to migrate to IP at
for premises-based automated customer service (and       their own pace, while leveraging their existing
routing to agents, when required, in both cases).        telephony infrastructure. Alla Reznik, Director of
(SSN, July 2009, p. 7). The organization offered         Product Management, Global Advanced Voice
automated speech recognition and text-to-speech as       Services, Verizon Business, noted that companies,
part of the automation for their hosted classical IVR    particularly in the current economic environment,
system (e.g., based on standard telephone interface      don’t have the luxury of uprooting old technology,
cards using older telephone standards such as TDM).      but must typically evolve to an IP infrastructure.
In August, the company added support for speech             Individual companies often have difficulty
technology with its hosted Internet Protocol (IP) IVR    supporting speech recognition solutions because of
platform. The new capabilities use the latest version    the special skills involved in developing effective
of      speech      technology      from      Nuance     solutions. It is difficult for them to hire employees
Communications.                                          with those skills, particularly when the need may be
   Tom Smith, Senior Manager, IPCC/Speech,               sporadic; and developing the skills often requires
Global Advanced Voice Services, Verizon Business,        exposure to multiple cases, rather than experience
explained that many customers find advantages in an      with one customer service operation. Smith noted
IP-based solution, and having speech automation          that Verizon Business has deep experience in
available increases their options. Some of the           developing and supporting speech-based customer
advantages of IP-based IVR include speedier              service, with, for example, more than 77 staff-years
customer issue resolution and reduced hold times,        experience in speech application development.
lower system costs due to using IP “plumbing” rather        Reznick indicated that Verizon Business, like
than telephone line switching equipment, easier          other hosted speech services, is seeing growth in this
management of bandwidth (for example, the number         economic environment. Companies can’t afford to
of calls handled simultaneously), and an easier          increase capital expenditures, but (1) keeping
structure for combining advanced applications and        customers is more important than ever; and (2) in
multiple sources of data (e.g., customer history         many cases, customer service call loads are
databases).                                              increasing, with more inquiries to financial firms and
   The company’s IP IVR service, together with           more government services such as those provided by
VoIP Inbound, enables customers to efficiently           the Federal National Mortgage Association (Fannie
manage and route inbound calls, with the flexibility     Mae).
to deliver customer calls to either traditional or IP-

    Convergys combines business intelligence with self-service and notification
                     Integrates former Intervoice products to ease personalization

   Convergys Corporation characterizes its core          as premise-based solutions. According to Watson, a
business as “relationship management.” Convergys         key benefit of the Software-as-a Service on-demand
supplies contact center agents as well as automated      model is that new features and capabilities become
solutions. The company uses the best of these agents     available sooner to the enterprise client, without the
as a resource to study how automated systems can         disruption associated with so-called “forklift”
best interact with customers (see VUI Visions, p. 34).   upgrades to on-premise solutions, where either
Participating in a keynote panel on “SaaS in Speech”     hardware and/or software must be replaced with
at SpeechTek, Paul Watson, vice president and            newer versions. “A SaaS on-demand model makes a
general manager in the company’s Relationship            lot of sense,” says Watson, “when an enterprise
Technology Management (RTM) line of business,            client is tight on capital, doesn’t have or doesn’t
discussed the Software-as-a-Service (SaaS) on-           want to acquire specialized experience in technology
demand model, an option Convergys offers as well         and the related efforts to monitor and incorporate
Speech Strategy News                                      September 2009                                      9
new upgrades, and when a client does not want to          based on a cost of $3 per agent-handled call. Based
have to worry about end-of-life technology.”              on this success, the company is now using the
   The company also made two specific                     solution to stimulate revenue by automating
announcements in August: (1) Intelligent Self-            customers’ IRA and CD renewal processes.
Service: Merging the recently acquired Intervoice            In conjunction with its Intelligent Self-Service
Voice Portal (IVP) with Convergys’ Business               solution launch, Convergys is also introducing a new
Intelligence (BI) product, Convergys Dynamic              Developer Zone website designed to help developers
Decisioning Solution (DDS); and (2) Intelligent           learn    more      about    Convergys’    application
Notification: Adding DDS to the Intervoice                development tools, including its Eclipse-based
Advanced Notification Gateway. A key aspect of the        Interaction Composer (IC). Using this site,
integration is the ability to use databases to make the   developers can download IC, install it on their
customer interaction more personalized.                   workstations, and immediately begin building
                                                          intelligent voice and multimodal applications, giving
Intelligent Self-Service                                  them the opportunity to “test drive” the capabilities
   By combining Convergys DDS and IVP,                    of the Intelligent Self-Service solution.
Intelligent Self-Service gives contact centers the
power      to    deliver    personalized,    relevant,
                                                          Intelligent Notification
multichannel, customer interactions and can provide          The new Intelligent Notification solution
tangible business results, estimated by Convergys as      combines intelligent automation with advanced
up to 25% reduction in cost-to-serve, by                  notification capabilities to help contact centers
 Enabling centralized policy creation and                transform their service strategies from reactive to
    management, to ensure a consistent customer           proactive, reduce costs through call avoidance and
    experience across all channels;                       call deflection, and increase revenue through
 Personalizing the customer experience to                targeted outbound cross-sell and upsell initiatives.
    increase loyalty and retention while reducing call    Intelligent Notification combines the Convergys
    handle times by up to 20%;                            DDS and the Intervoice Advanced Notification
 Accelerating self-service adoption rates to reduce      Gateway. For example, in the financial services
    costs and increase call containment rates by up to    industry, a credit card customer can receive an
    20%;                                                  outbound IVR courtesy call or text message after an
 Leveraging cross-sell and upsell opportunities to       unusual flurry of activity on their card to ensure the
    increase revenue;                                     card has not been lost or stolen. Upon receiving this
 Integrating with current business systems to take       notification, the customer can choose to be
    advantage of existing customer information; and       seamlessly connected to inbound self-service or a
 Eliminating the risk and expense of a “forklift”        live agent.
    technology upgrade.                                      The Intelligent Notification solution provides a
                                                          tighter integration with current business systems,
   Convergys contrasts its approach to other solutions
                                                          including CRM, billing, and legacy systems. The
that use intelligence to simply route the customer to
                                                          information on customers’ interactions personalizes
the appropriate agent. The intelligence within
                                                          typical automation capabilities intended to drive
Convergys Intelligent Self-Service follows customers
                                                          increased IVR containment, more cross-sell/upsell
through their entire interaction experience from the
                                                          opportunities, and proactive communications.
time they initially make contact to the time their
                                                             A leading grocery, pharmacy, and convenience
issue is resolved.
                                                          retailer implemented Intelligent Notification to alert
   A leading national financial services company
                                                          its customers when prescriptions are ready. After a
deployed the Intelligent Self-Service solution as a
                                                          successful pilot program, the retailer rolled out
pilot program. Using Intelligent Self-Service
                                                          Intelligent Notification to all of its pharmacy
improved call containment in the company’s IVR to
                                                          locations, increasing its outbound prescription
almost 93 percent of the pilot group, an increase of
                                                          notifications from 50,000 to 330,000 per month and
almost four percentage points above historical rates.
                                                          significantly reducing its incoming call volume.
With such a high level of containment, the company
expects to see a savings of almost $500,000 annually,
Speech Strategy News                                     September 2009                                    10

   Companies use Ifbyphone’s Voice Broadcasting service to contact customers
                Lawn care company uses the outbound service for accounts receivable
   Ifbyphone’s “Voice Broadcasting” service lets         its entire accounts receivable list each week in under
companies automatically deliver interactive phone        two hours. In addition to the manpower savings,
calls, such as payment-reminder calls. Ifbyphone         accounts are more current with the system. “Our
provides businesses a suite of phone automation          savings is almost $1,000 per month,” said Brian
services as a hosted solution (SSN, August 2009, p.      Klenke, director of sales at Weed Man Madison. He
11).                                                     said the franchise is recommending the Ifbyphone
   In August, Ifbyphone gave an example of the used      service to other Weed Man franchisees.
of its Voice Broadcasting option. Two franchisees of        Weed Man Winnipeg in Manitoba, Canada, has
Weed Man, a network of locally owned and                 been using Ifbyphone’s Voice Broadcasting solution
operated lawn care professionals throughout North        for several years to remind clients about upcoming
America, are using the feature. Weed Man Madison         lawn treatment appointments. David Hinton, owner
chose Ifbyphone’s Voice Broadcasting service to          of Weed Man Winnipeg, explained, “Many
automate accounts receivable collection calls. Before    customers require notification to unlock their gate or
using Ifbyphone’s Voice Broadcast service, Weed          keep their pets inside. Ifbyphone quickly makes the
Man Madison had one employee who spent 20 hours          calls and gives us a record of the contact, saving
each week making accounts receivable calls. With         many valuable administration hours every day.”
Ifbyphone, the company now can automatically call

   Survey shows $5.6 billion loss from inability to meet customer expectations
                             Indians prefer automated self service to agents
   Survey concludes businesses in Australia, New         programs that don’t let them reach a human agent,
Zealand, and India lose a combined $5.6 billion in       working with agents who are not empowered to
revenue due to inability to meet customer                make decisions, and having to repeat information—
expectations—Indians prefer automated self service       such as name and account number—every time their
to agents                                                call is forwarded to another department. (Other
   A new survey on customer experience and               independent surveys have confirmed the need for
consumer behavior, conducted by Greenfield Online        repetition as one of the most prevalent complaints of
and sponsored by Genesys Telecommunications              consumers.)
Laboratories suggests that businesses in Australia,         In Australia, consumers felt the most challenging
New Zealand, and India lose $5.6 billion every year      communication channel is automated self-service or
due to poor customer service over the web, in the        speech recognition (39%), followed by call centers
contact center, or via mobile devices as consumers       (22%), and paper mail (13%). In New Zealand,
abandon transactions or end relationships when           consumers felt the most challenging communication
companies do not meet their expectations. While in       channel is automated self-service or speech
most cases the individual will turn to a competitor      recognition (41%), followed by call centers (21%)
for the business, in a surprising number of              and paper mail (13%). In India, consumers felt the
instances—over 30%—the consumer simply decides           most challenging communication channel is live
not to spend any money. In all three countries, people   contact center agents (35%) then e-mail (14%),
prefer to deal with a company by phone.                  followed closely by automated self-service or speech
   Respondents said some of their key annoyances         recognition (12%) and SMS (11%).
are automated, difficult-to-navigate, self-service

     Voice control from ATX for BMW available in September with new models
                 Control the telephone, climate control, navigation, and sound systems
  ATX, a subsidiary of Cross County Automotive           services for global automobile manufacturers. For
Services, provides customized, “connected” vehicle       vehicles with built-in wireless connections, ATX
Speech Strategy News                                      September 2009                                    11
provides safety, security, communication, navigation,         Depending on the BMW model, the system is able
and information services to vehicle owners, using         to recognize alternative wordings (e.g. “save
automated systems and agents. The services are            number” or “save”) and connected sequences of
typically branded by customers such as Toyota,            commands (e.g., the numerals of a telephone
Lexus, BMW, PSA Peugeot Citroen, Mercedes-                number). Each time you give a command, the system
Benz, Maybach, and Rolls-Royce.                           emits a tone to indicate it has been understood.
   A new BMW voice control system available in                One single voice command is sufficient to
September lets you control features such as the           completely transmit the driver’s destination into the
telephone, climate control, navigation, and sound         navigation system. The driver’s verbal statement
systems with spoken commands. The voice control           specifying the place, street, and number is processed
system is activated by pressing a key on the              immediately by the system, all data going directly
multifunction steering wheel. The system recognizes       into the navigation unit.
up to 500 preset terms, depending on the features             The speech recognition can find individual music
with which your BMW is equipped. The verbal               titles. The BMW system monitors and interprets the
commands are picked up by a special hands-free            user’s voice commands regarding the type of music,
microphone that filters out background noise.             the name of the artist, an album. or an individual

TeleNav GSP Navigator to be available on T-Mobile myTouch 3G and Sidekick LX
           Spoken directions and speech recognition for address entry and business search
   On August 4, TeleNav announced that TeleNav            of a business or the address and TeleNav GPS
GPS Navigator will be one of the first turn-by-turn       Navigator will provide directions.
GPS navigation services available to run on the T-           Subscribers can also preplan trips online by
Mobile myTouch 3G with Google. TeleNav GPS                accessing their account through My TeleNav.
Navigator was available for a free 30-day trial           TeleNav GPS Navigator includes listings of more
beginning August 5, when the device went on sale in       than 10 million businesses and services, including
retail stores and online. T-Mobile Sidekick LX '09        restaurants, hotels, shopping malls and movie
customers can also download TeleNav GPS                   theaters, providing users access to restaurant ratings
Navigator via the T-Mobile Sidekick LX '09                and reviews as well as phone numbers for business
Download Catalog. The service is available for $9.99      listings.
per month for unlimited use.                                 Once on the road, TeleNav GPS Navigator
   TeleNav GPS Navigator on the T-Mobile                  monitors each specific route and will proactively
myTouch 3G includes full-color 3D moving maps             search for known traffic congestion or incidents.
along with voice and on-screen turn-by-turn driving       Customers will be alerted to traffic problems, both
directions. TeleNav GPS Navigator also includes           audibly and on-screen, and can choose to find
speech recognition for both address entry and             another route to their location by just pressing one
business search. On the T-Mobile myTouch 3G,              button.
customers simply press one button and say the name

       Ditech voice access chosen by for Developer Program
                       Enterprises can activate voice applications during any call
   Ditech’s “toktok” service (pronounced “talk-talk”)     an address book, to create new tasks, and to take
allows a speech service to be available on any call       voice notes.
(SSN, April 2009, p. 1). When a call is placed               In August, Ditech Networks announced that
through the Ditech platform, a caller can say the, a nationwide CLEC and business
keyword toktok during the call to bring up any voice      voice provider, has chosen Ditech as one of a select
service that uses the Ditech solution. Ditech has         group of telephony-focused developers for its newly
developed several initial toktok applications that use    launched Developer Sandbox Program. Ditech's
speech for end-to-end call control, to find contacts in   toktok    technology    is   being    piloted    on
Speech Strategy News                                       September 2009                                    12’s VoIP network for give developers           CTO of, said, “Professionals are
and beta users.                                            now heavily dependent upon the usability and
   By integrating Ditech's toktok application server       Internet features of their mobile devices, and the
and mStage media processor into the          time has come for innovation to catch up with these
network, Ditech will enable developers participating       growing business needs. Integrating toktok’s Voice
in's Developer Sandbox Program to            Web solution into our Sandbox will help test our
take advantage of Ditech’s services such as keyword        vision of catalyzing the next generation of telephony
spotting and “whisper messages.” T.R. Missner,             applications.” partners to extend capabilities of its hosted call center services
                   Novauris provides option for Name and Address Capture by speech

  In August,, a provider of on-demand               Novauris uses its own speech recognition engine.
business voice solutions (SSN, June 2009, p. 8),           With just a single utterance, a user can select from
announced integration partnerships with Novauris           millions of items (addresses, POIs, music tracks,
Technologies, VoiceVault, and Plug'n Pay to                etc.). NovaSearch is efficient and can run entirely
extend the options for its hosted services.                locally on mobile devices such as smartphones or’s Name and Address Capture Solution            PNDs, or it can run on a server, handling many voice
incorporates     Novauris’      NovaSearch       speech    search requests simultaneously, as in the
recognition technology, which was specifically             case.
designed to address the recognition of long lists,            VoiceVault provides speaker verification solutions
typically from databases (SSN, May 2008, p. 9).            and services (SSN, January 2008, p. 16). Integrated
When applied to addresses, this technology lets            into’s service, it can provide a biometric
callers enter any basic U.S. street address in a single,   security option. VoiceVault provides an alternative
continuously spoken, utterance as they would               to PINs, passwords, or security tokens.
naturally provide it to a human operator, rather than         With Plug'n Pay’s technology, merchants can
having to adopt the awkward and time-consuming             accept and manage both credit card and electronic
process of entering each part of the address               check payments in a secure environment. These
separately as required by existing spoken address-         solutions simplify the real-time order management
capture systems.                                           aspect of payment processing, saving time for both
                                                           agents and customers.

 Syntellect releases VoiceXML Studio 7.2 and offers free trial of testing solution
                    Also in the process of updating its full development environment

   Syntellect announced the availability of Syntellect     video communication solutions. Additionally,
VoiceXML Studio 7.2, the latest version of its             Syntellect lowered prices to $189 per port for the
VoiceXML development environment, which is                 current CT ADE version and reaffirmed the
tightly     integrated     with     the     Syntellect     company’s relationship with Dialogic, part of that
Communications Portal. The Communications Portal           environment. Bruce Petillo, marketing manager at
resulted from the acquisition of Envox Worldwide in        Syntellect, explained that the early announcement
October of last year and is a descendant of the Envox      was to reassure customers and the development
Communications Development Platform 7.1.                   community as a whole of Syntellect’s short and
Syntellect has deemed the Communications Portal            long-term commitment to CT ADE, because of
“the future of Syntellect’s self-service, IP               concern the company would drop the product after
communications platform offerings.”                        the Envox acquisition. Christoph Mosing,
   Syntellect also announced the “start of                 Syntellect’s vice president of business development,
development” of CT Application Development                 said, “CT ADE is one of the most prominent IVR
Environment (CT ADE) version 11. CT ADE is a               development environments in the industry. We… are
widely used IVR development tool for voice and             committed to taking it to the next phase of maturity.”
Speech Strategy News                                     September 2009                                   13
  Syntellect also announced the availability of a free      Expanded speech recognition and text-to-speech
30-day trial version of the latest release of its            engine support through testing Syntellect’s
Voiyager application testing solution. Voiyager is           MRCP implementation against additional
designed to reduce the time and effort required to           engines to ensure compatibility, as well as
design,    deploy,    and   maintain     VoiceXML            examining routes to integration with speaker
applications.                                                verification engines;
                                                            Added build support for Visual Studio 2008;
VoiceXML Studio 7.2                                         Enhanced support for streaming video;
   VoiceXML Studio 7.2 accelerates the creation of          Improved density and performance with
VoiceXML-based voice solutions by 50% or more                Dialogic Host Media Processing Software; and
relative to the previous release, the company               An enhanced and improved user interface.
estimates. J.R. Sloan, VP of product management             As noted, Syntellect is introducing new lower
and marketing, Syntellect, said, “We believe that our    pricing of $189 per port for the current CT ADE
VoiceXML Studio solution, when combined with our         version, which includes full feature support for
Communications Portal Studio and Voiyager                Dialogic System Release 6.0, Dialogic HMP
automated testing technology [SSN, February 2009,        Software 3.0 and their associated media/interface
p. 12], offers the most powerful and efficient           boards, speech recognition, text to speech,
development solution set in the industry.” An            multimedia and video. Additionally, Syntellect is in
evaluation version is available.                         the initial phases of establishing a worldwide CT
   The VoiceXML Studio 7 enhancements include:           ADE developer community where users can share
 Support for the VoiceXML 2.1 and 2.0                   ideas, tips, applications, and best practices. The
    specifications;                                      initial launch of the CT ADE developer community
 A graphical programming environment that               is targeted for the fourth quarter of 2009.
    eliminates the need for manual coding but
    generates VoiceXML suitable for high-                Voiyager
    performance applications;
                                                            The Voiyager trial version includes software for a
 Java, XML and Web Services capabilities that
                                                         fully functional installation as well as a sample
    allow developers to integrate easily with existing
                                                         VoiceXML application to make it easy to try the
    software applications and leverage prior
                                                         software. With guidance from a simple installation
    investments     in    software    and     solution
                                                         wizard and a “getting started” guide, companies can
                                                         have Voiyager running in as little as ten minutes,
 Advanced database controls that enable the
                                                         Syntellect estimates. Bruce Sherman, product
    automation of transactions and business
                                                         manager for Syntellect, said that a minimal time
                                                         investment of about an hour should allow an
 An integrated debugger that streamlines
                                                         organization to discover how the suite of time-saving
    development and testing time; and
                                                         tools in Voiyager can help reduce costs across the
 Application server support for Apache Tomcat,
                                                         entire application lifecycle. He said, “Considering
    IBM WebSphere, and Oracle WebLogic.
                                                         the cost savings Voiyager customers have
CT ADE 11                                                experienced ranging from the tens to hundreds of
                                                         thousands of dollars, that one hour is time well
  Syntellect CT ADE 11 is scheduled for release in       spent.”
mid-2010. Enhancements to Syntellect CT ADE 11
will include:

                    Pronexus enhances its “CT ADE Migration Program”
           VBVoice IVR development tool positioned as replacement for Syntellect offering

  Syntellect is upgrading its contact center             port (previous article). Pronexus is directly
solutions, including its CT Application Development      competing with its VBVoice Rapid Application
Environment (CT ADE), a widely used IVR                  Development Toolkit (for IVR applications within a
development tool and has lowered prices to $189 per      standards-based Visual Studio .NET environment)
Speech Strategy News                                      September 2009                                     14
with enhancements to their “CT ADE Migration              for those developers who are uncertain about their
Program.” This program was originally introduced in       current IVR’s future or who want to convert to
2004 for CT ADE IVR developers looking to convert         VBVoice.
their existing CT ADE IVR application to VBVoice.            Gary Hannah, President and CEO of Pronexus,
  VBVoice is an integrated GUI-based development          said, “I am proud to say that during these difficult
solution. The Pronexus CT ADE Migration Program           economic times, Pronexus has remained steadfast
offers a migration price incentive of $99 a port, along   and has been experiencing some of the strongest
with VBVoice consulting and a CT ADE conversion           demand ever for VBVoice.”
manual. This program has been designed specifically

               Empirix tests contact center solutions by simulating callers
                                          Stress testing at full load

   Empirix provides what it calls “service quality          Partly because of many patents in this area, few
assurance solutions for new IP communications.” Its       other companies offer comparable services, notes
Hammer Test Engine can, for example, test an IVR          Tim Moynihan, vice president of marketing for
system by simulating a caller. The system tests the       contact center solutions at Empirix. Partners include
compliance of the system with a design specification,     Avaya and Genesys.
using speech recognition to detect audio prompts and        The Hammer can be purchased or used through a
responding with an audio recording (or text-to-           service. Empirix Testing as a Service tests an entire
speech) or touch-tone of a legitimate alternative. By     contact center infrastructure at full call capacity for
making enough calls, the system can text all the          any size environment with unlimited TDM and IP
paths of the IVR system to verify that it is working      call capacity. Capabilities include load and
as planned, as well as measuring other facets of          functional testing, proactive monitoring of
performance, such as latency. The company can also        applications, and custom professional services.
measure the performance of IP communications

          Nexidia enhances its speech analytics to detect unexpected events
                Supplement to core product doesn’t require advanced set-up or analysis

   Nexidia offers audio search and speech analytics       professionals can then drill more deeply into the
solutions based on a phonetics-based approach and         identified audio with Nexidia’s flagship product,
recently released a major upgrade of its core speech      Enterprise Speech Intelligence (ESI).
analytics software (SSN, August 2009, p. 8). In             Core capabilities in Nexidia ESP include:
August, the company added Nexidia ESP, designed            Automatic identification of relevant topics: The
to discover unexpected changes in caller behavior at          scanning process identifying company-specific
call centers that might lead to increased call volume         information is performed automatically, and can
or handle times. The offering is available                    be run periodically to update the vocabulary as
immediately.                                                  new terminology develops.
   Jeff Schlueter, vice president, marketing, Nexidia,     Efficient speech analytics: Nexidia ESP
said that there is an initial setup in which company-         processes thousands of hours of audio per day
specific words are added to an existing vocabulary of         with just a single server, analyzing it against the
several hundred words and phrases that can detect             custom      vocabulary,     computing     baseline
specific types of problems or changes in the                  statistics, and detecting emerging trends. In
customer service interaction. The initial setup can           addition to the search results for spoken terms,
collect unique words and phrases from a company               Nexidia ESP incorporates meta data from the
web site or collateral material. The new software             recordings—including call duration, non-speech
then detects significant changes in frequency of the          detection, and other classifiers—to identify those
phrases appearing in calls over time. Customer care           topics that are not only the most frequent, but
Speech Strategy News                                     September 2009                                    15
    that also have the greatest impact on call center       Alerts: Criteria for alerts can be established so
    operations.                                              immediate notifications are sent when results
   Visual reporting & analysis: Nexidia ESP                 exceed defined parameters.
    provides visual reports to help customers
                                                           Nexidia also announced in August that the
    interpret the results. Snapshots show at a glance
                                                         Federal Trade Commission (FTC) has purchased
    which topics are currently the most important.
                                                         multiple licenses of Nexidia’s AudioFinder audio
    Trend analysis shows which topics are either
                                                         search software. The FTC will use AudioFinder to
    increasing or decreasing in rank. In addition,
                                                         review audio content produced in its investigations.
    relationship charts (clustering) show which
    topics are most closely aligned with each other.

          CallMiner speech analytics incorporated into solutions by resellers
                 Aspect Software helps healthcare organization with analytics solution

   CallMiner, Inc. provides speech analytics             vocabulary by recognizing words pronounced
solutions directly to enterprises, as well as working    similarly, and the actual word is often obvious. In
through partners that integrate the speech analytics     addition, product names and other expected non-
from CallMiner into broader analytics solutions.         standard words can be added to the dictionary and
CallMiner’s own Eureka! enables organizations to         the statistical language model by providing a few
analyze speech captured in contact center                examples, e.g., from a company web site.
conversations. Examples of companies incorporating
CallMiner technology include Aspect Software and         Aspect
IS Solutions.                                               Aspect, a unified communications solutions
   CallMiner uses a speech-to-text approach              provider, provides all-in-one contact center solutions
supported by built-in classification based on            and unified communications, but also offers
keywords and phrases. The company contrasts its          technical workforce optimization capabilities with
approach to “phonetic-based search applications          dashboards, scorecards, and reporting. Aspect
where you must write a query to get an                   integrates speech analytics from CallMiner into its
answer…Eureka automatically provides you with an         solution.
answer.” In an interview, Jeff Gallino, Chief               Northshore University HealthSystem, an
Technology Officer, said the company uses its own        integrated healthcare system comprised of 68
speech recognition technology. He said the large-        medical offices and facilities, is an example of a
vocabulary approach supports “discovery,” while a        company using Aspect for analytics. Aspect uses the
search-based phonetic approach is unlikely to            recording capabilities of Aspect Unified IP to record
discover events that may represent new and               100% of Northshore’s contact center interactions for
significant customer service problems.                   quality monitoring purposes. They are using the
   Scott Kendrick, Vice President Product,               information gathered in the calls for training
CallMiner, demonstrated the system to Speech             purposes and call optimization. NorthShore is
Strategy News to indicate a typical discovery case.      leveraging the tight integration between Aspect
He showed how the system shows statistics of words       Unified IP recording and its speech analytics
occurring in call center conversations, but screens      solution from CallMiner to automatically analyze
out words that are not information-bearing.              calls and gain a view of customer concerns and how
“Clustering” shows which words tend to occur             agents responded. Aspect Unified IP integrates with
together, so that “web page” and “doesn’t work”          NorthShore’s Epic electronic patient medical
occurring in a cluster, for example, might suggest the   records database.
call was caused by a web site problem. The audio of
individual calls with these keywords can then be         IS Solutions
reviewed to determine the detailed interaction. Of
                                                            IS Solutions provides website design and
course, standard tests for expected behavior can also
                                                         development services as well as products from
be monitored statistically.
                                                         leading industry vendors for customer experience
   Gallino noted that a large-vocabulary speech
                                                         analytics, website analytics, content management,
engine can detect words or phrases not in its
Speech Strategy News                                       September 2009                                     16
online risk management, application security, web          Solutions believes that integrating speech analytics
conferencing, and e-learning. The company                  with existing business and website data completes
announced a reseller agreement to integrate                the customer experience analytics puzzle, enabling
CallMiner speech analytics into IS Solutions’ suite of     us to provide comprehensive user information across
products and solutions and sell it in Europe. IS           all channels and resulting in superior business
Solutions’ Sales Director, Peter Kear. said, “IS           intelligence.”

      Nice Systems combines phonetic and large-vocabulary speech analytics
                 A recent win includes a $55 million award from a government agency

   Nice Systems provides systems that include call         JP Morgan Chase, Bank of America, American
recording and speech analytics for the enterprise and      Airlines, Conseco, and Safeco.
security markets. The company uses its own speech             Nice analytics are also used for audio search in
technology, notes Barak Eilam, corporate vice              security applications, and the company announced in
president and general manager, Interaction Business        August that it had won a “mega security contract
Applications, at Nice. The public company reported         from a government agency.” The government
$140.5 million in revenues for the second quarter.         agency will be implementing Nice’s NiceTrack
   The company has both phonetic and large-                technology for advanced telecom interception. Nice
vocabulary technology for audio search and speech          has received an advanced payment for the first phase
analytics, he said. A feature of the technology is tools   of the contract, which is expected to generate $55
for analysis of speech in noisy environments where         million in revenues starting in 2010 over a period of
some recognition errors are inevitable. The                two to three years.
company’s analysis software represents revealing              The NiceTrack solution enables interception of all
occurrences such as “common pairs,” content words          types     of    communications      and     generates
that occur together frequently, and call volume in         comprehensive intelligence. NiceTrack offers a
categories defined by specific terms.                      unified set of solutions for the collection and
   The solution can be delivered as on-premise             analysis of both telephony and Internet data for law
software or a service. Eilam said the company had          enforcement, intelligence, and internal security
many “high-end” companies as customers, including          organizations.

 CallCopy’s new speech analytics solution claimed to offer low cost of ownership
                           Overly complex solutions may have slowed adoption

   CallCopy provides call recording and quality            and improve operational efficiencies across the
management solutions (SSN, August 2008, p. 34). In         organization.
August, CallCopy announced the availability of its            However, notes Ray Bohac, CEO, CallCopy,
new speech analytics solution. The phonetics-based         “While the benefits of speech analytics solutions are
product is designed to deliver a lower total cost of       clear, the market to this point has seen relatively low
ownership. CallCopy said that contact centers of all       adoption rates. Several factors have contributed to
sizes can realize the benefits of speech analytics once    this, including high implementation costs, overly
available only to large enterprises.                       complex solutions that require dedicated analysts
   In today’s economic environment, getting by with        and added professional services costs, and a
good-enough isn’t good enough; it is critical that a       relatively low return on investment.”
company understand their customer needs and                   CallCopy’s new release is designed to lower
behavior. Speech analytics can extract essential           barriers to adoption and is fully integrated with
business intelligence from within contact-center           CallCopy’s “cc: Discover,” including its call
recordings. By leveraging speech analytics,                recording, quality management, and desktop screen
companies      can     proactively     identify    sales   capture modules. CallCopy’s speech analytics is also
opportunities, uncover customer satisfaction issues,       designed to avoid a lengthy setup processes and to
reduce corporate risk, ensure regulatory compliance,
Speech Strategy News                                       August 2009                                       17
deliver a “more practical feature set,” according to          Stress detection – Beyond specific phrase
the company. Features include:                                 identification, the software can identify agent or
 Phonetics-based engine – Without the need for                customer emotional variances for further
    transcription to text, a phonetics-based search            analysis.
    can search for non-standard words such as                 Silence detection – Abnormal pauses or holds
    jargon, slang, and foreign words.                          can indicate potential workflow issues and areas
 Keyword/phrase spotting – The phonetics-based                for process improvement.
    engine can mine 100% of call recordings,                  Faster time-to-value – The CallCopy solution is
    identifying words or phrases that have been                designed to avoid the need for a large
    identified as business-critical. Calls identified as       professional services engagement or dedicated
    containing these words or phrases are flagged for          analyst staff.
    immediate follow-up.
                                                             CallCopy also announced that its cc: Discover 4.0
 Confidence scoring – Probability scores with
                                                           has tested compatible with Cisco Unified
    results allow scores below a user-definable
                                                           Communications Manager 7.0. Compatibility testing
    threshold to be automatically filtered.
                                                           was performed as part of CallCopy’s membership in
                                                           the Cisco Technology Developer Program.

   OnviSource integrates speech analytics with its workforce optimization suite
                     Free pilot program allows companies to “try before they buy”

   OnviSource, a provider of call center workforce         an embedded part of every WFO solution. Well-
optimization and automation software solutions and         integrated speech analytics can transform traditional
award-winning business process outsourcing services        quality assurance into an enterprise-wide solution
(SSN, May 2009, p. 16), announced in August the            that offers benefits ranging from significantly
integration of its Explora speech analytics with their     improved quality assurance and productive
OnviCord Workforce Optimization (WFO) suite of             evaluation and training to the extraction of valuable
software solutions, including call recording, quality      market and business intelligence.”
assurance,    screen     capture,   and    workforce          Call centers and enterprises wanting to “try before
management applications. The core speech                   they buy” can use Explora to automatically analyze
recognition technology in Explora is from Aurix,           customer interaction call recordings. Pilot program
which provides a phonetics-based audio search              participants can verify their own applications and
capability.                                                test the product with complete support from
   With the embedded Explora solution, analysis is         OnviSource. Support includes assistance with system
performed automatically drawing on multi-media             setup and guidance for conducting queries and data
sources of text, scores, screens, and audio. Art Yri,      analysis to ensure that users realize the potential of
Chief Technology Officer for OnviSource, explained,        speech analytics in their environment. Participants
“We strongly believe that speech analytics should be       are not obligated to purchase Explora.

              Utopy announces new version of its speech analytics product
        Combined with a new version of its contact center performance optimization solution

   Utopy Inc. is a provider of speech analytics            allows speech-analytics-driven coaching and
solutions using basic speech recognition technology        performance management for agents.
from Nuance (SSN, August 2009, p. 17). Utopy                  Utopy indicates that the release is not
recently announced the availability of Utopy               evolutionary, but was designed “from the ground
SpeechMiner 6.0 and Utopy OutPerform 6.0—the               up.” It leverages actionable insights from the call
latest versions of its speech analytics product and        content and the interaction itself to drive contact
contact center performance optimization solution,          center performance management, quality monitoring,
respectively. The combination of both solutions            coaching, and workforce management processes
Speech Strategy News                                    September 2009                                   18
centered around contact center key performance          and proactive recommendations on who, what, and
indicators (KPIs). The analytics-driven workflows       why an action needs to be taken.
trigger alerts and recommendations for improving          SpeechMiner 6.0’s Web 2.0 platform simplifies
company processes and agent performance. The            deployment. It comes with dashboards that offer
subsequent actions taken are further tracked and the    multiple and preset views, permission controls, and a
results measured for closed-loop continuous             subscribe-and-publish     capability.   The      new
refinement.                                             architecture also comes with plug-and-play widgets
   The new OutPerform Coaching 6.0 application          and multimedia support for Flash, videos, and RSS
enables targeted, performance-based coaching driven     feeds. In addition, SpeechMiner 6.0 offers broader
by SpeechMiner’s Speech Analytics. OutPerform 6.0       call monitoring capabilities, search summaries,
also offers role-based dashboards, rule-based alerts,   dynamic and static call lists, permalinks, and a
                                                        messaging widget.

   DSS adds speech analytics from Aurix to its multi-media recording solutions
                               Designed for the 911 public safety market

   DSS Corporation, a national provider of 911          Safety market because typically the voice recordings
multi-media recording solutions using the brand         were only searchable by 3% of the information (e.g.,
Equature, announced the release of EQ Speech            time/date, phone number, ANI/ALI, position, and
Analytics for the Public Safety market. EQ Speech       duration). Multimedia support means that one can
utilizes a phonetic speech engine from Aurix (SSN,      search “211 in progress,” for example, for results in
November 2008, p. 14). The speech is processed into     voice calls, SMS messages, email, and other 911
a phonetic representation and can then be searched      data.
for terms that are not in a system dictionary,             “In today’s changing markets, it is even more
including slang, proper names, addresses, crime         important that municipalities manage all of their
codes, etc. UK-based Aurix was formerly 20/20           communications more efficiently and cost
Speech, which has its roots in defense research.        competitively to help improve citizen protection”
   EQ Speech allows dispatch centers to search 100%     said Joe Mosed, general manager, DSS. “Our New
of the content of their voice recordings by words and   EQ Speech Analytics program for municipalities
phrases. This is a big step forward in the Public       does just that.”

    Voxeo releases major upgrade of its “Unlocked Communications” platform
                 Voxeo open-source initiative aims to increase options for customers

   On August 25, Voxeo Corporation announced the        Service (SaaS, hosted speech applications) at
early-access release of Prophecy 10, its flagship       SpeechTEK, a Voxeo panelist indicated that Voxeo
“Unlocked Communications” platform. Voxeo offers        had 60,000 telephone ports supporting speech
unified      communications      and     self-service   applications.
applications, including speech recognition and text-
to-speech technology, as both a hosted and on-          Unlocked Communications
premises solution. The company provides its own           Voxeo describes Unlocked Communications as
speech technology for those companies that want to      the “core differentiating strategy of Voxeo and its
use it. Voxeo also recently announced an open-          Prophecy software”:
source initiative to make options it uses in its         No vendor lock-in: Prophecy 10 is specifically
solutions more widely available independent of              designed to enable customers to easily switch to
Voxeo (SSN, August 2009, p. 14); the company                another platform at any time. (Some of the
amplified on its motivation for doing so (see end of        referenced open-source efforts were targeted
this article). Voxeo also announced a partnership           toward this promise.) Dan York, Director of
with Nu Echo to use Nu Echo tools to speed IVR              Conversations at Voxeo, said Voxeo is
development (p. 24). In a panel on Software as a            obsessively focused on retaining its customers
Speech Strategy News                                    September 2009                                      19
    via product and support excellence, and never via       available to users of Voxeo’s VoiceObjects
    vendor lock-in. The company said that Voxeo is          platform. Fifteen basic analytics reports are built
    still the first and only vendor to be certified         into Prophecy 10, and customers seeking deeper
    100% compliant in VoiceXML Forum’s                      analysis can optionally purchase the full Voxeo
    compliance testing. Voxeo’s VoiceObjects                Infostore Analyzer, which provides over 45
    application design and delivery platform works          reports and integrates with popular Business
    with VoiceXML browsers from over 30 vendors,            Intelligence      solutions       from        SAP
    enabling customers to build their applications          (BusinessObjects),     IBM      (Cognos),      and
    once and deploy them on nearly any VoiceXML             MicroStrategy.
                                                          At SpeechTEK in August, Voxeo demonstrated
   Open standards: Voxeo supports VoiceXML,
                                                        Prophecy 10 running on a cluster of 20 Acer Aspire
    CCXML, and SIP standards. Prophecy 10
                                                        netbook computers. The demonstration was intended
    delivers a clean, scalable SIP foundation
                                                        to show how a 2,000-port speech, IM, and SMS-
    compliant with over 15 Internet Engineering
                                                        enabled Prophecy system can be deployed in less
    Task Force (IETF) and IMS (IP Multimedia
                                                        than half a day and with less than $8,000 in
    Subsystem) SIP standards and based on the Java
    SIP Servlet (JSR-289) standard. This foundation
    bridges diverse SIP services and devices, eases     Voxeo’s open-source initiative
    integration with IP-PBX and call center
    investments, and lowers the cost of transferring       Dan York, Director of Conversations at Voxeo, in
    calls with direct IP to IP connectivity. Prophecy   response to an inquiry from this newsletter,
    works with SIP platforms and services from          elaborated on Voxeo’s motivation for making some
    AT&T, Avaya, Cisco, Global Crossing,                of its technology available as an open-source option
    Nortel, Verizon, and over 50 other companies.       for developers. He noted that, as the company
   Integration       of     voice     with     other   releases the source code under an open
    communications modes: Prophecy 10 enables           source license, it will include a Media Resource
    any VoiceXML application to interact with users     Control Protocol (MRCP) client, so that people
    not only via voice, but also via SMS, IM,           installing other copies of the Tropo code will be able
    Twitter, web chat, and the mobile web. For          to interact with any MRCP-compatible speech
    example, with Prophecy 10, a pre-existing           engine for speech recognition or text-to-speech. This
    customer self-service application written in        speech engine could of course be Voxeo's own
    VoiceXML can now be used to deliver self-           Prophecy product, or it could be any other speech
    service via SMS with little or no modification.     engine that supports the MRCP standard.
                                                           York explained: “In the world of XML-based
   Prophecy 10 also includes the following new          telephony, the market has evolved to where it is all
technical features and capabilities:                    about the open standards of VoiceXML and
 Performance improvements: Prophecy 10 can             CCXML. Vendors are now competing on who has
    now run up to 100 concurrent calls on low-power     the best platform and best support rather than who
    Intel Atom based servers, and up to 500 calls on    has the best proprietary XML language for voice
    a moderate Intel Xeon based server. Prophecy 10     apps. Voxeo thrives very well in that world of open
    is also a fully 64-bit-enabled VoiceXML and         standards. We don't need proprietary lock-in to
    speech recognition platform. When run on a 64-      succeed.”
    bit Windows or Linux OS, Prophecy 10 supports          He added:
    up to several thousand concurrent calls on a          “Today, though, the leading edge of web development
    high-powered Intel Xeon server.                        has moved on from XML into a world of lightweight
 Reduced developer overhead: Prophecy 10                  APIs and programming languages such as ruby,
    includes a new “Developer Mode” that                   python, groovy, PHP and javascript. As API-based
    significantly reduces the CPU thread and               ‘cloud telephony’ platforms continue to emerge, our
    memory overhead of a Prophecy installation.            concern is that we are seeing the same kind of
 Prophecy Infostore Analytics: Prophecy 10                proprietary solutions emerging as we saw in the
                                                           early days of XML telephony. We would like to see
    bundles and leverages analytics capabilities from
                                                           the same open playing field for vendors. We want to
    VoiceObjects Analyzer, enabling any existing           move the market to competing on who has the best
    VoiceXML application to use Infostore analytics.       platform and best customer service…not on who has
    Previously these analytics capabilities were only
Speech Strategy News                                        September 2009                                     20
   the best way to lock you in to their proprietary cloud      providers out there and customers have a choice.
   telephony platform.                                         We'll compete in that space—and we want
  “By making Tropo available in open source, our aim           customers to choose Voxeo and stay with us because
   is to have it adopted so that there are multiple            they want to, not because they are forced to.”

         Nuance Communications speech attendant rated “Avaya Compliant”
                            Nuance speech-enabled call router passes Avaya tests

   Nuance Communications announced that its                 communications, contact centers, and related
Open Speech Attendant (OSA) is compliant with key           services directly and through channel partners.
contact center solutions from Avaya. OSA enables            Nuance is a member of the Avaya DevConnect
employees and customers to reach anyone in an               program—an initiative to develop, market and sell
organization by saying their name or, in some cases,        third-party products that interoperate with Avaya
a department. Speech recognition avoids the “spell          technology. As a Platinum member of the program,
by keypad” approach and other clumsy alternatives           Nuance is eligible to submit products for
to helping callers reach people at a company. These         compatibility testing by the Avaya Solution
alternatives become even less attractive as more            Interoperability and Test Lab in Lincroft, N.J., where
people use mobile phones as their primary or only           a team of Avaya engineers develops a
telephone (p. 43).                                          comprehensive test plan for each application to
   Avaya is a global leader in enterprise                   verify whether it is Avaya compliant. As noted, OSA
communications       systems,    providing   unified        has been deemed Avaya Compliant.

                 New version of Nuance TTS for contact center and mobile
                     Vocalizer 5.0 blends synthetic and recorded speech automatically

   Nuance’s text-to-speech solutions are used, for             Vocalizer   5     also    simplifies  application
example, to provide a computer-generated voice to           development with tools for tuning and voice
deliver dynamic customer service information by             sculpting. Vocalizer 5 manages the static prompts,
companies such as Amtrak and United Airlines;               carrier prompts, and computer-generated speech
voice reading in the Amazon Kindle, and turn-by-            through one unified interface.
turn directions in many navigation devices. Nuance’s           Dan Faulkner, vice president of product
new release of its text-to-speech engine for contact        management and marketing, Enterprise Division,
centers and other network-based applications,               Nuance, said, “The new solution represents a
Vocalizer 5.0, was characterized as a “complete             breakthrough because it can naturally and accurately
spoken output solution” by a Nuance spokesperson.           speak information that would previously have
Vocalizer 5 blends text-to-speech with pre-recorded         required an agent to read aloud. The new voice
audio. All requests for audio output can be directed        quality improves the customer experience, and
to Vocalizer—including static prompts, concatenated         provides a greater opportunity to automate—
prompts, dynamic TTS, or any combination of the             especially with name and address read out—
three—instead of requiring special application              reducing contact center costs.”
treatment.                                                     Nuance Vocalizer will support more than 41
   The new software has been shown through                  languages and dialects, including U.S., U.K. and
independent tests, Nuance indicated, to approach            Australian English, Arabic, Brazilian Portuguese,
recorded-speech quality for name and address                Canadian French, Dutch, French, German,
playback. In addition to the core software solution,        Hungarian, Indonesian (Bahasa), Mandarin Chinese,
Nuance also offers Vocalizer 5 Basic, a cost-               Russian, Swedish, Romanian, and Latin American
effective solution for enterprise applications that         and Mexican Spanish.
require only limited, small-set vocabularies, such as
currency, date and time, and telephone numbers.
Speech Strategy News                                     September 2009                                     21

    French transport firm uses Nuance speech technology in customer service
                          12% of proof of delivery requests are fully automated

   TNT, an express parcel delivery service, is now       it launched the new project to update customer
using speech recognition technology from Nuance          service through speech in 2007. TNT is now
Communications in its customer service operations.       working on an upgrade that can be rolled out
TNT’s French subsidiary now uses speech to handle        internationally.
all incoming calls to its customer service center           Laurent Husquin, Project Manager at TNT, said,
(around 8,000 calls a day). The French subsidiary        “Our aim was to extend the opening hours of our
now completes 1,500 customer queries a day without       customer service, speed up response times and the
agent interaction, one third of which are for parcel     quality of the information provided to our customers.
tracking.                                                Another objective, the backdrop to this project, was
   The speech recognition application is capable of      to improve satisfaction with the service we provide.”
supplying precise details of the status of a delivery,      Ian Turner, general manager Northern Europe at
or even issuing a new delivery order. Using the self-    Nuance, summarized, “The technology has raced
service functions, 12% of proof of delivery requests     ahead, in sophistication and maturity. There’s plenty
are fully automated, freeing up TNT agents to            of evidence to show that speech technology is now
concentrate on higher-value calls.                       the key ingredient in bringing the call center into the
   TNT in France delivers 350,000 parcels every day      21st Century, and making this journey as quick and
to its 50,000 customers. Having deployed an IVR          effortless as possible.”
server based on Nuance speech technologies in 2003,

         Apple’s new Snow Leopard OS has enhanced accessibility features
                         VoiceOver speaks what one is pointing to on the screen

    The screen-reading technology using text-to-         For example, after setting the rotor to “Word” or
speech in Apple’s operating systems, VoiceOver,          “Character,” each time you flick, VoiceOver moves
now offers a new capability with the just-released       through the text one word at a time or one character
version of the operating system, Snow Leopard. A         at a time, speaking each—particularly useful in
user can control the computer using gestures on a        proofreading or editing text.
multi-touch trackpad without seeing the screen. The           Sometimes items in applications are not well
trackpad surface maps to the active window on the        labeled, so VoiceOver can describe them only with
computer. Users can touch to hear the item under         vague terms like “blank,” “empty,” or “button.” If
their finger, drag to hear items continuously as they    the user knows what the item is or has sighted
move their finger, and flick with one finger to move     assistance, can a custom label can be assigned. The
to the next or previous item. VoiceOver will begin       next time the item is visited, VoiceOver will
reading an entire web page automatically after it        describe it using the custom label.
loads, and users can employ key commands or                   Users can change the way VoiceOver speaks
gestures to control VoiceOver as it’s talking.           punctuation, identifies changes in text attributes,
    VoiceOver offers a virtual control called a rotor.   announces links, and more. They can choose one of
When you turn itby rotating two fingers on the           three standard verbosity levels — high, medium, and
trackpad as if you were turning a dial, VoiceOver        low — or customize them by adjusting 30 separate
moves through text based on a setting you choose.        settings.

             PerSay releases product for testing voice biometrics solutions
                  Also releases product for real-time detection of fraudsters calling in

   On August 24, PerSay introduced PerSay                other    vendors’    voice   biometric    (speaker
Evaluation Studio, a product that addresses the need     authentication) systems and technology. (For an
to professionally plan, test, and analyze PerSay’s and   overview of speaker authentication approaches, see
Speech Strategy News                                        September 2009                                     22
the guest article by Judith Markowitz, SSN, August,         additional environments are planned in the near
p. 33.) The company also announced that it has              future. The solution supports TDM, VoiP, and
successfully completed the field testing of a new           hybrid contact centers.
“Fraudsters Detection” solution that enables contact           The product depends on the usual behavior in
centers to detect known fraudsters in real-time, as         attempted telephone fraud; identity thieves call in
they call in.                                               repeatedly, seeking a way through the call center’s
                                                            defensive procedures. Using pretexts, widely
Evaluation tool                                             available customer information, and other social
  Persay notes that the performance of voice                engineering techniques, they occasionally succeed.
biometrics engines and systems is affected by many          Once successful, fraudsters may break into multiple
factors, and measuring accuracy and optimizing these        customer accounts. While contact centers today have
systems to deliver best performance can require             the ability to record fraudsters’ voices on tape, they
expert advice. The new tool uses a “wizard-like”            lack a real ability to use these recordings and
process that reduces the need for expert help,              proactively fight and reduce fraud. Using Persay’s
executing tests, analyzing results, and producing           text-independent speaker identification, contact
reports.                                                    centers can generate automatic alarms when
  Ariel Freidenberg, EVP Global Sales and Business          fraudsters call.
Development at PerSay, cited the company’s                     The Fraudsters Detection Solution taps into every
“extensive lab and field experience in deploying            incoming call and examines it against a set of
voice biometrics technology, as well as in supporting       previously created known fraudsters’ voiceprints, as
customers and prospects throughout the selection            well as the claimed customer’s voiceprint if it exists.
process, pilots, and initial rollouts” that has gone into   This test is performed at various intervals during the
the product’s development. Elyashiv Kellerman,              call. The result is a fraud detection score that is
PerSay’s VP Professional Services added, “We                mapped to a set of alert levels. A client application
specifically added an interface enabling testing of         running on the agent’s desktop pops up and directs
other voice biometrics products for the benefit of our      the agent to take the appropriate actions according to
prospects.”                                                 the alert level received.
                                                               Almog Aley-Raz, PerSay’s CEO, said, “We are
Fraudster detection                                         extremely satisfied with the results our Fraudsters
                                                            Detection Solution obtained in a real-life contact
  PerSay’s Fraudsters Detection Solution is based on
                                                            center environment. The broad and extensive testing
the company’s FreeSpeech speaker identification
                                                            that was performed enabled us to explore the factors
platform that analyzes a caller’s voice during a
                                                            affecting the detection accuracy and false alarm rate,
natural conversation with an agent, without any
                                                            leading to an optimized system and our customer’s
specific vocabulary or prompting required. The
Fraudsters Detection Solution is currently available
for Genesys contact centers, and integration with

     STC highlights media monitoring service and voice verification technology
                         Voice-based access control for mobile devices one option

  Speech Technology Center (STC) is a developer             has over 100 installations in private and government
of speech-related products, including digital               forensic audio labs in the US. At SpeechTek in
recording; forensic labs; noise cancellation software       August, the company discussed a media monitoring
and hardware; biometric voice verification and              service   and     voice     identification/verification
identification; and Russian speech recognition and          technologies.
TTS solutions (see interview, SSN, June 2008, p.               The company’s Voice Grid is a new product line
20). The company’s head office is located in St.            for speaker search and identification. Voice Grid
Petersburg, Russia, with branches in Moscow and             products work with different media (phone lines,
Saarbrucken, Germany. STC’s technologies are                microphones, personal recorders, etc.). Voice files
supplied through a dealer network in about 60               with specific voice can be identified, allow
countries. The company’s Sound Cleaner Premium              monitoring media for specific speakers.
Speech Strategy News                                       September 2009                                    23
  VoicePin is a voice-based access control solution        Windows mobile device and integrated with any of
for mobile devices that combines reliability of            its programs or applications (SMS, e-mails, files).
biometric verification with the convenience of a           VoicePin can ensure the security of corporate or
voice interface. VoicePin can be installed on any          personal data on mobile devices.

                 Auraya Systems develops new voice biometrics solution
             Allows thresholds to be set based on the business risk of a specific transaction

   Australia-based Auraya Systems spent four years         has established the National Centre for Biometric
providing professional services to assist banks,           Studies, which is focused on evaluation, research
financial institutions, and government departments         and education in biometrics. Summerfield has a
with the choice and use of voice biometric                 background in speech recognition as founder and
technologies (speaker authentication) for banks,           CEO of Syrinx Speech Systems, an Australian
financial institutions, and Australian government          technology business that during the 1990s deployed
departments. The company decided that voice                speech recognition in contact centers.
biometrics developed for call centers were largely an         Summerfield said that the company has now
efficiency tool and that stronger methods were             developed a new generation of voice biometrics
required for more demanding security applications.         technologies (called SAVVy) that offer Equal Error
   Auraya observed that, as Internet security has          Rate (EER) performance 2 to 5 times more accurate
improved, criminal syndicates increasingly targeted        than currently available systems. (Speaker
the telephone channel as the “weak link in the             authentication systems have variable parameters that
delivery of on-line services,” and have in recent          can be set to tradeoff false acceptances with false
years compiled lists of stolen personal identity           rejections; EER, as the name suggests, is the error
information (some of it from call centers that they        rate that makes the two types of errors equal.) The
have infiltrated). In one case in the US, the identities   company indicated that Auraya’s technologies and
of some 30,000 individuals were stolen by a call           solutions are the result of many years of consulting
center employee and used fraudulently. The judge in        work with major government departments (including
that case chided the company for the pain it caused        social services, child support and national security
its customers due to its lax security.                     agencies), telecommunications carriers, and major
   Voice biometrics don’t require customers to repeat      banking and financial services organizations in
sensitive information such as mother’s maiden name         Australia and the US. Summerfield noted that
or social security numbers; the authentication can         speaker authentication has often been seen as “a
just be a spoken challenge word or an account              quirky offspring of speech recognition,” with the
number. The agent thus doesn’t necessarily have            result that it is viewed by potential buyers as an
access to critical personal information if the             error-prone process; he notes that it deserves to be
authentication is biometric.                               considered as a separate technology with its own
   Auraya works largely with the security                  demands and differences.
departments in the IT department of companies, as             Summerfield said that Auraya’s business rules
opposed directly with call center managers. Dr. Clive      engine can be adapted to existing systems that use
Summerfield, founder and CTO of Auraya, noted              other speaker authentication technology, but will
that the concern of security departments isn’t             give improved results if run with Auraya technology.
efficiency as a primary concern, but protection of         He indicated that Auraya can tune the engine and the
sensitive information. Auraya usually positions its        business rules to deliver solutions that meet clients’
solutions as a professional services engagement, with      specific security and business risk needs.
the voice biometrics sold as licenses for a maximum           Auraya’s technology is available for integration,
number of identities stored, and a renewal of a            trial, and demonstration with IVR systems
subscription to the licenses on a periodic basis. The      worldwide          using       its       web-services
company has its own speaker authentication                 interface. VoiceXML and Java Applet software for
software, but can also put its technology on top of an     IVR and IP application integration is also available
existing, installed speaker authentication technology.     from Auraya.
   Dr. Summerfield is also Adjunct Professor of               Auraya’s delivery model does not integrate their
Computing at the University of Canberra where he           technologies on the IVR in the front office, as do
Speech Strategy News                                      September 2009                                     24
most call-center-oriented speaker verification               Another issue with voiceprints is poor enrollment.
technologies. SAVVy is a back-office solution that        Enrolling in a noisy environment can create a
sits    alongside      identity   management       and    problem. Others include poor telecommunication
authentication systems/frameworks. It provides an         environments, network interference, and clipping.
“identity authentication assurance” measure that is       Human behavior can also be a problem. Callers may
used to approve a caller’s access to a secure service     say fewer than the required number of samples, or
or allow a transaction.                                   they may not repeat the desired phrase accurately.
   A key issue for security experts is how to set the     SAVVy approaches this problem with a speech
thresholds that determine acceptance errors versus        quality assurance process, “Speech QA”, that is
rejection errors. One can, of course, set the             embedded in the SAVVy system to alert the
thresholds so high that many legitimate users can’t       applications to compromised or poor quality speech
use the system, or so low that there is little security   samples that are likely to lead to a less than reliable
added. Auraya developed “Impostor Maps,” a                enrollment (or verification) and a potentially
technology that maps authentication results to            vulnerable voiceprint.
business risk. Impostor Maps is the part of SAVVy            As an additional security feature, SAVVy’s
that enables the security performance of a voice          “Black-List” process automatically compiles lists of
authentication solution (whatever its configuration or    potential fraudsters and tracks their behavior.
underlying technology) to be continuously calibrated,     Connected to the business rules engine, the process
enabling the business risk associated with individual     automatically detects fraudulent speakers and checks
authentication results to be determined. The result is    their voice samples against a black-list of fraudsters
a built-in process that automatically sets thresholds     compiled and maintained by the system. The process
based on a required business risk for a specific          allows security managers to detect and understand
transaction. Summerfield gave the example of a            potential fraudulent attacks in telephone services in
balance inquiry versus a money transfer as involving      real-time and respond accordingly. Groups of
different levels of risk.                                 customers (such as the banks) can potentially
   A key point that Auraya feels most voice               collaborate, as they currently do in response to
authentication systems don’t take into account is that    Internet service attacks, and share black-list
not all “voiceprints” are equally secure. Some are        databases with each other and with law enforcement
difficult to break, and some aren’t, the company has      authorities.
observed. SAVVy can calibrate the security                   Beyond security, speaker authentication can also
performance       of    each    speaker’s    voiceprint   create efficiencies in call center operations. A caller
individually and set thresholds specifically for each     can simply say an account number for verification
speaker and for each type of speech they use to           that can be used for the dual purpose of finding the
authenticate, a feature that Auraya says other systems    account and verifying the caller’s identity through a
don’t appear to support. Instead of using a fixed         voiceprint. If the particular transaction requires a
global threshold, which assumes that all voiceprints      higher level of security, then the company can add
have equal performance, SAVVy sets thresholds for         further authentication layers; Auraya says it can
each voiceprint based on its individual security          compute just how much additional identity assurance
performance. To further satisfy both the security and     those extra layers provide. A further layer could be a
customer service stakeholders, SAVVy can set two          callback to the account holder at a registered phone
thresholds, an upper threshold computed to satisfy        number, for example; this solution would have the
security requirements and a lower threshold               additional advantage of a cleaner line if line noise
computed to ease customer service interactions.           was the problem, Summerfield noted.

                           Nu Echo offers IVR application testing tool
         Voxeo will offer Nu Echo’s grammar development tool integrated with VoiceObjects
  Nu Echo provides professional services in speech        professional version of its NuGram IDE (Integrated
recognition application development for contact           Development Environment), based on Eclipse (SSN,
centers, including grammar development and tuning,        July 2009, p. 20). In August, Nu Echo and Voxeo
speech application testing, and consulting. The           announced that Voxeo will offer Nu Echo’s NuGram
company also licenses to outside companies tools          grammar development tool in an integrated package
that it uses internally. In June, the company added a     with Voxeo’s VoiceObjects Service Creation
Speech Strategy News                                         September 2009                                    25
Environment. Nu Gram also released an additional             become generally available as either a SaaS solution
tool, a Beta version of its NuBot IVR Application            or as an on-premise solution.
Testing Platform, available for free trial for a limited        The NuBot features include:
time.                                                         Fully integrated testing environment – From one
   Yves Normandin, Nu Echo CEO, summarized the                   single interface, users can create new test
company’s evolution:                                             scenarios, launch and monitor tests, and retrieve
“Nu Echo was founded seven years ago with a simple               and analyze test results.
 mission: To make and enable the best speech                  Modularity – Complex applications can be
 applications. Our previous experience had told us that          described using simpler modules, each of which
 great applications just don’t happen by accident, that          can be independently tested. Furthermore, test
 effective tools are absolutely required. Therefore, while
                                                                 scenarios designed for the simpler modules can
 other vendors were developing tools to implement call
 flows and dialogs, we have specifically focused on              also be re-used when testing the complex
 building tools that directly impact user experience, task       application end-to-end.
 success rate, and application robustness. These tools,       Inbound and outbound support – The NuBot call
 which truly are the foundation of Nu Echo’s speech              processor can both simulate a person initiating a
 practice, cover key aspects of speech application               call or receiving a call.
 development, including grammar development, tuning,
 automated testing, and pronunciation management. What       Voxeo integrates Nu Echo grammar tool
 has changed over the past year is that we have now            Voxeo provides both hosted and premise-based
 turned some of them into products, in particular the        speech and IVR application support (p. 18). Voxeo’s
 NuGram Grammar Platform and the NuBot Testing
                                                             VoiceObjects Service Creation Environment
NuBot IVR Application Testing Platform                        VoiceObjects Analyzer for analytics;
                                                              Multiple phone channel support for voice, video,
   Testing of IVR applications during development
                                                                 text, and mobile Web using markup languages
and after any changes is a standard practice (or
                                                                 and standards such as VoiceXML, HTML and
should be). But manual testing can be costly and
                                                                 USSD; and
incomplete, leaving problems that show up in
                                                              A “Desktop for Eclipse” development
customer frustration and lower automation rates. An
                                                                 environment for individuals and teams.
automated testing solution can be set up to do
complete testing and to repeat that testing exactly             Voxeo is now offering Nu Echo’s NuGram
when later corrections or changes are made.                  grammar development tool in an integrated package
   Nu Echo has spent the last four years developing          with     the     VoiceObjects     Service     Creation
the NuBot Automated IVR Application Testing                  Environment. The integration allows developers to
Platform. The NuBot Platform software architecture           create either static or dynamic grammars that can be
integrates:                                                  used with VoiceObjects technology to efficiently
 The NuBot Integrated Testing Environment                   build multichannel self-service applications.
    (ITE), an Eclipse plug-in used to develop test              NuGram IDE Basic Edition enables developers to
    scenarios, manage tests, and analyze results;            author grammars using one single concise and
 The Robot Server, a middleware component that              legible format, regardless of the target speech
    centralizes all call processing functions; and           recognition engine; includes a grammar editor with
 The Asterisk open source telephony platform,               several advanced features; provides powerful
    which interacts with the application through a           grammar analysis, visualization, and debugging
    public or private network.                               tools; and provides tools to test grammar coverage
                                                             and semantic interpretation correctness.
   The first NuBot beta was offered in May 2009 to a
                                                                Michael Codini, Managing Director at Voxeo
restricted number of users. Nu Echo is now releasing
                                                             Germany, notes that the combination allows the
the second beta version of the NuBot Platform to a
                                                             same application to be deployed as “a speech-
wider audience. Qualified beta 2 users will be
                                                             enabled IVR application, an IMbot, and even an
provided with a free copy of NuBot ITE, and they
                                                             HTML interface for mobile devices, without limiting
will be able to execute tests using the Robot Server
                                                             the dialog to using simple, fixed choices.” He adds,
and Asterisk platform offered as a SaaS solution
                                                             “Outbound applications will even be able to reach
through the NuBot Hosting Service (offered free
                                                             users based on their preferred channel or their
from September 1st to November 30th, 2009).
Following the beta 2 phase, the NuBot Platform will
Speech Strategy News                                      September 2009                                     26

               Openstream offers multimodal browser for mobile phones
               Used by Omnesys and Openstream in brokerage and trading application
  Openstream Inc. offers a multi-modal solution,          paradigm of interaction for mobile users that
the Smart Messaging Platform, which enables               combines voice, touch, and key-presses. For
enterprises to deploy mobile offerings in a scalable      example, a “what can I say?” help button brings up a
and secure manner. According to the company, the          context-sensitive list of phrases expected by the
platform is based on open standards and is capable of     application, and the user can use the touch-screen or
delivering mobile services from any data source,          speech to make a choice.
over any network, and to any device in any mode.            Openstream and Omnesys Technologies, a
Openstream provides mobile solutions in mobile            brokerage and trading platform provider in India,
force automation for enterprise field and sales forces,   announced a use of the Cue-Me multimodal browser
mobile banking and brokerage, mobile digital media        for brokerage and trading solutions. The multimodal
delivery and monetization, and healthcare.                solution to encourage use of the solution by
  In August, Openstream launched its multimodal           brokerage houses. Features include access to NSE,
browser Cue-Me for Windows Mobile, Symbian, and           BSE and NCDEX markets in India; local financial
Blackberry phones, with a version for the iPhone on       news; real-time quotes; currencies, interest rates, and
the way. Raj Tumuluri, president and CEO of               global indices; and the ability to place orders.
Openstream said the browser provides a new

                      Aisle411 service helps find items in a retail store
                         Customer calls a toll-free number on any mobile phone
   Call 877-AISLE-411, say the store that you are in      message. Ace expects cooperative offers from its
and what product you want, and AISLE-411 tells            suppliers to help subsidize the service.
you precisely where it is. This service is provided by       A survey conducted by Bryles Research for
Aisle411 Inc. using any mobile phone and network-         Aisle411 found that nearly 84% of shoppers have
based speech recognition. AISLE-411 also supports         some difficulty finding products on store shelves—
retailers and manufacturers by offering them              especially at hardware, sporting goods, and big-box
opportunities to increase productivity by helping         stores. More than 67% of shoppers say they would
their customers enhance their shopping experience         use this free service, especially women aged 31-40.
through relevant promotions, mobile coupons. and             Nathan Pettyjohn, Aisle411’s Chairman & CEO,
product information—via audio or text messages—           said, “More than 22% of shoppers give up while
within seconds of a purchasing decision.                  looking for hard-to-find items in a store. Another
   Ace Hardware is the first to try the service,          56.6% ask an associate for help, creating a huge
initially in a suburb of St. Louis, MO. Signs in the      demand on staff time and attention.” Aisle411 will
store and on the shopping carts encourage customers       also be able to provide retailers and consumer
to try the service to find what they want and to get      product manufacturers with information about their
special offers. Offer “coupons” are delivered by text     customer’s buying preferences, including what and
                                                          when products are most often searched.

                                How Aisle411 works from any mobile phone
Speech Strategy News                                        September 2009                                      27

                       Automated language translation a growing area
                        SpeechGear, AppTek, BBN, Language Weaver, and Next IT

   Human language is amazing in its subtlety and its        Interact has properly transcribed a spoken sentence
complexity. “Time flies” as you read this, but you          in the language of origin by having it spoken back
can also “time flies” with a stop watch to see how          using text-to-speech before translating, editing the
fast they move. In the first case, “time” is a noun and     sentence in the originating language if necessary
“flies” is a verb; the opposite is true in the second       (e.g., by saying an incorrect word and repeating the
case. Translation from one language to another when         correct replacement), generating a translation,
one is working with grammatically and lexically             having the translation repeated, and other features.
accurate text is difficult enough, but when the origin      All of these features are now accessible in a hands-
is speech that may be not fully grammatical and             free and eyes-free manner.
where speech recognition can produce lexical errors,           The speech recognition technology is licensed, but
the problem is compounded. Despite its difficulties,        requires significant adaptation by SpeechGear,
the commercial, governmental, and military                  Palmquist noted. The text-to-speech is also licensed.
advantages of being able to communicate in two                 Interact uses a general translation approach and,
different languages motivate continued development          while customers can add words and pronunciations
in the area. This article discusses some recent             to the system dictionary, the intent is to avoid over-
developments from SpeechGear, AppTek, BBN                   tuning to a particular customer environment. For
Technologies, and a cooperative effort of Next IT           example, Palmquist noted that, while one might
and Language Weaver.                                        think a bank teller’s interaction would be narrow,
                                                            customers sometimes begin or interject more general
SpeechGear speech-to-speech translation                     conversation in order to be friendly, or often ask
   now allows hands-free operation                          other questions relating to bank services than one
   SpeechGear has a range of translation products,          would expect at a teller’s window.
capable of translating text-to-text, but the company’s         Palmquist said that the company’s translation
focus is on translation where the input and/or output       software emphasizes accurate conveyance of the
is speech (SSN, September 2009, p. 15). Robert              desired message. For example, while “This food not
Palmquist, SpeechGear’s President and CEO, said             safe to eat” is not fully grammatical due to an
that the company’s two basic product lines are              omitted word, it is certainly better than the one-word
Interact and Interpreter, both software products.           error, “This food is safe to eat.”
Interact is intended as a general translator, and runs         Interpreter works on smaller devices and is more
on Windows PCs, with a single-license prices of             limited. SpeechGear uses a phrase builder
$995. Interpreter is a more limited product that can        technology, which eases the translation of commonly
run on smaller devices, such as Windows mobile              used phrases. With Interpreter technology, the user
phones, and is in effect a phrase translator with some      can add “keywords” to a pre-built phrase. For
special features, including the ability to create carrier   example, I will come back in ...? becomes I will
phrases and application-specific lists of items such as     come back in five days or I will come back in two
a store’s products; it has a much lower price point         hours by choosing the phrases and, in essence, filling
per copy.                                                   in the blanks. A user can also merge any word in the
   The     company      recently     announced       that   dictionary with a phrase. For example, I would like
Compadre:Interact now includes a complete hands-            to buy a ... becomes I would like to buy a magazine
free and eyes-free interface. Palmquist said that a         or I would like to buy a ticket. Once a phrase is built,
range of customers wanted to use the device while           the translation is immediately displayed and
maintaining surveillance of a person or persons with        spoken. The user can customize the vocabulary to
whom they are interacting, e.g., police officers,           support specific needs.
military users, and even in a retail store when the            Palmquist said that the recession had, perhaps
salesperson is demonstrating a product.                     surprisingly, helped generate interest in the
   While it may seem that speech-to-speech                  company’s products. The company is still based on
translation would be hands-free in general, not all         private investment. Recently a medical institution
functions were previously voice-enabled. Examples           invested $200,000.
include turning Interact's microphone on or off,
selecting the language being used, verifying that
Speech Strategy News                                      September 2009                                    28
AppTek launches a free machine translation                    According to the company, during the first three
   service on its website                                 years of the GALE program, BBN met or exceeded
                                                          the accuracy goals for automatic translation of
   AppTek, a specialist in human language                 Arabic newswire text and broadcast news into
technology (HLT), announced the availability of a         English. Under this latest contract award, BBN will
free machine translation (MT) service, “Quick             continue to work in Arabic from both speech and
Translate,” for more than 23 language pairs on its        text sources to meet increasingly difficult accuracy
website The service covers                goals. BBN continues to work in Chinese under a
English and a variety of other languages                  separate award.
(bidirectional): Arabic, Korean, Japanese, Chinese,
Turkish, Persian\Dari, Urdu, Pashto-English, Bahasa            The BBN GALE team includes BBN speech and
Indonesian, Tagalog, French, German, Italian,             language scientists as well as researchers from other
Portuguese, Polish, Russian, Spanish, Ukrainian,          institutions in the U.S. and abroad. The BBN team’s
Hebrew, and Dutch.                                        approach combines the output transcriptions and
   The service features AppTek’s hybrid approach to       translations from multiple systems to obtain a
machine translation. A prime motivation for a hybrid      translation that is better than any of the component
MT (HMT) system is to take advantage of the               system translations. Tad Elmer, president and CEO,
strengths of both rule-based and statistical              BBN Technologies, said, “DARPA’s ambitious
approaches, while mitigating their weaknesses.            GALE program has already made significant
AppTek’s HMT solution is an integration of both           advances in automatic language processing. We are
MT methodologies; the company emphasizes that the         achieving a level of accuracy that, before this
approach is more than simply adding rules to the          program, was regarded as impossible.” For more
statistical system or a minor statistical module to the   discussion of BBN efforts in speech and language
rule-based engine. AppTek attempts to balance the         processing, particularly in speech analytics, see the
three key translation quality parameters of MT            interview on p. 33.
systems—fluency, informativeness, and adequacy.
   AppTek’s product offerings include MT and              Next IT and Language Weaver team to
speech recognition for a growing list of more than           provide a self-service solution for
23 languages; multilingual information retrieval             multiple languages
with query and topic search capabilities; name-
finding applications; and integrated suites providing        Next IT, which creates “virtual experts,” and
speech recognition and machine translation in media       Language Weaver, which offers statistically-based
monitoring of broadcast and telephony speech, as          automated language translation, announced a
well as handheld and wearable speech-to-speech            strategic partnership to deliver Language Weaver’s
translation devices.                                      translation software tightly integrated with Next IT’s
                                                          ActiveAgent software. The combination will answer
BBN Technologies gets $14 Million from                    users’ natural-language text questions across
  DoD for rapid foreign language                          multiple languages.
  processing                                                 Next IT Chief Technology Officer, Dr. Charles
                                                          Wooters, said, “This is not merely a ‘bolting
    BBN Technologies has been awarded $14                 together’ of two powerful technologies…The result
million in funding by the US’s Defense Advanced           of this collaboration is that our Virtual Experts will
Research Projects Agency (DARPA) in the fourth            be able to understand multiple languages and their
year of the five-year Global Autonomous Language          language comprehension skills will continue to
Exploitation (GALE) program. The goal of GALE is          improve as they engage with users.”
to develop and apply software technologies to                Next IT’s technology engages the user through a
transcribe, translate, and distill large volumes of       conversational medium; it uncovers the user’s intent
speech and text in multiple languages with more than      to resolves the user’s needs. In one solution,
90% accuracy by the end of the program. Such a            ActiveAgent leverages an organization’s complete
capability would help analysts recognize critical         asset and resource portfolio through a multiple-
information in foreign languages quickly.                 service application including web, call center,
                                                          intranet, mobile, and other user touch points.
Speech Strategy News                                      September 2009                                     29

                 Humanity Interactive offers animated online characters
                      Red Shift provides supporting speech recognition technology

   Humanity Interactive offers online animated               Red Shift Company was formed to create and
“virtual personalities.” The companys’s Synthetic         commercialize a new line of voice products built on
Animation Engine can interface with any text-to-          bio-mimicry principles that model human hearing
speech (TTS) engine and can also be used to animate       and speech. Red Shift says it is addressing speech
recorded .wav files. Lonnie Benson, CEO, Humanity         recognition, text-to-speech, hearing aids, noise
Interactive, calls Red Shift a “sister company” that      cancellation (reduction), voice identification, speaker
has developed its own speech recognition technology       and microphone technologies, and all human-
that can support interaction with the virtual             machine speech interfaces.
personalities. The Animation Engine can use outside          Red Shift said its speech recognition is based on
text-to-speech technologies.                              biologically salient components of speech, extracting
   Human Interactive wants its technologies to be         traditional features for these engines using methods
used to deliver interactive interfaces for online,        closer to the way the ear works rather than more
offline, and gaming applications. The company in          standard transformations to the frequency domain.
2003 deployed several third-party online two-             Red Shift claims by using biologically inspired
dimensional characters for the automotive industry.       features, the speech recognition has shown over 60%
A few dealer organizations started using these            improvement over Cambridge University’s Hidden
cartoon characters on their websites, and customers       Markov Model ToolKit and 50% improvement over
began interacting with them more than expected.           Carnegie Mellon University's Sphinx 3 decoder
This led to research on “personifying technology”         (for both continuous and semi-continuous models).
and to the company’s current product line.

                      VoiceXML Forum celebrates 10-Year anniversary
          AVIOS and Forum cooperate to advance best practices in using speech technology

   The VoiceXML Forum, a global industry                  AVIOS will pursue a range of educational and
organization focused on accelerating the adoption of      marketing activities to promote speech technology,
VoiceXML, speech recognition, and related                 applications, services, and related standards. More
technologies and standards, celebrated its tenth          details on specific initiatives under this partnership
anniversary by announcing that Avaya and                  will be announced over the next few months.
Convergys have joined as sponsor members of the              VoiceXML Forum Chairman Rob Marchand, said,
VoiceXML Forum and have been named to its board           “AVIOS has played an integral role in growing the
of directors. Avaya and Convergys join other              speech industry and providing excellent educational
Sponsor Members Cisco Systems, Genesys                    opportunities for speech practitioners. We look
Telecommunications          Laboratories,        Holly    forward to working with them on educational and
Connects, Loquendo, Nuance Communications,                other initiatives.” Bill Scholz, president of AVIOS,
Verizon, and West on the board of directors in            noted, “Our two organizations have complementary
providing strategic direction and leadership to the       goals, and this exciting industry development will
forum.                                                    allow us to leverage the strengths of both entities for
   In addition, the VoiceXML Forum announced it is        the good of the industry.”
is broadening its scope of activities and opportunities      The VoiceXML standard has been an unqualified
for member involvement with the start of a                success. According to senior industry analyst Dan
collaborative initiative with the Applied Voice           Hong of Datamonitor, “2008 was the inflection point
Input/Output Society (AVIOS), a non-profit                where VoiceXML ports shipped eclipsed that of
foundation dedicated to education about the practical     traditional IVR. The question is no longer if an
applications of advanced speech technology and            enterprise will make the move to VoiceXML, but
sponsor of the Voice Search Conference (p. 56).           when.”
Working together, the VoiceXML Forum and
Speech Strategy News                                      September 2009                                    30

Vianix highlights opportunities in field service and mobile workforce automation
                   Speech processing optimized for high speech recognition accuracy

   Vianix, a subsidiary of iMPAQ Corporation,             and/or transmitted with less bandwidth, but retain the
offers a speech compression algorithm suite, SPART        quality necessary for accurate speech recognition.
(Speech Processing for Automatic Recognition                 Ramaswamy observes that voice is the mode of
Technology), which the company says is specifically       input most effective when both hands are needed for
tuned and optimized for high quality voice and            other duties such as operating machinery, holding
speech recognition accuracy (SSN, May 2009, p. 22).       onto objects or examining patients. In the field,
Veeru Ramaswamy, CTO, Vianix, said that the               insurance agents or claim adjusters at the scene of
company has been responding to multiple requests          accident can compile their findings more rapidly
from customers in field service and mobile                using voice input as a primary modality.
workforce automation, inquiring how Vianix                   For companies involved in developing services
technology plays into such an environment. The            using voice recording, Vianix offers a SPART
company feels it can serve markets in these areas in      partner program, which includes Windows and
verticals such as healthcare, insurance claims,           Windows CE ACM codecs, a Software Development
factories and warehouse monitoring and data-access,       Kit for the SPART codec and an application suite,
real-estate management, and in general, mobile sales      which includes API level access to library classes,
force management using voice to access data from          functions, and procedures to allow a seamless
back-end servers. The advantage that Vianix would         integration with partner products. The tools include a
offer is that the compressed speech could be stored       means to convert from the SPART proprietary
with less impact on memory on mobile devices              format to a generic format, and vice versa.

              3Play Media uses speech recognition with quality assurance
                The FeedRoom and 3Play Media offer online video captioning services

   3Play Media provides cost-effective transcription      formats. Johnson added, “We have found that our
services, specializing in time synchronization, video     quality meets and exceeds expectations of even
captioning, archive indexing, and search. Four MIT        traditional transcription; and we have passed
graduate students founded the company in 2007,            rigorous quality tests with existing customers.”
inspired by efforts to find affordable accessibility         Johnson said that every file is run through speech
solutions. Their customers include universities, film     recognition for the first pass, even if the audio
production companies and major corporations.              quality is poor. The software is designed to be
   The company indicates that its rates are often         compatible with most recognition engines. Johnson
lower than off-shored solutions despite using a           said that the company’s primary supplier to date has
domestic labor force, achieved through “patent-           been Wizzard Software (p. 50), which resells IBM
pending” technology to streamline a human’s ability       ViaVoice speech-to-text software. The company also
to transcribe audio files. CJ Johnson, a co-founder,      has a search solution that can find content in a video
said that the company uses speech recognition in part     library, for example.
as a pre-processing step for transcriptionists to            In August, the company announced they were
reduce costs. He indicated that the company has           teaming with The FeedRoom to provide captioning
developed an environment that allows humans to            services. The FeedRoom helps companies with their
interact with the speech recognition output, to correct   online video communications through a hosted live
errors, and polish the transcripts to the level you       video service and a web-based digital asset
would expect from a traditional transcription firm,       management solution. Customers are as diverse as
including speaker labeling, proper paragraphing,          Barnes & Noble, Boeing, MetLife, and The
formatting, and time-stamping.            For time-       Pentagon. The two companies will help Fortune
synchronized projects, the back end software can          1000 enterprises, media organizations, and
automatically create text, word, and HTML-                government agencies to accelerate the process of
formatted transcripts; time-synchronized HTML and         producing affordable, high-quality video captions in
XML transcripts; and over a dozen closed captioning
Speech Strategy News                                      August 2009                                        31
compliance with Section 508 of the Rehabilitation         turnaround options. Once transcribed using the pay-
Act.                                                      as-you-go service, the transcription can be edited, if
  The FeedRoom’s flexible, lightweight software           necessary, from within the console prior to
video players provide support for caption display.        downloading standard .dfxp (distribution format
The display can be configured by users on the fly         exchange profile) or .html files for synchronized
using a Player Composer module within the                 delivery in FeedRoom players or linked to specific
FeedRoom Studio video publishing application.             videos.
  Users can upload video directly to the 3Play
console, choosing from next-day or three-day

                    Interview with Darrell Knight, Message Technologies
                                    Dedication to hosted IVR solutions

   Darrell Knight, President, Message Technologies, Inc., was interviewed by Bill Meisel in late August. A
seasoned leader and long-time entrepreneur, Knight brings over 24 years of experience in management,
operations, and leadership, with a record of year-over-year revenue increases and client base expansion.
Prior to joining MTI, Darrell successfully established, built, and sold three separate technology companies.
Most recently he co-founded Arizan Corporation (Sept 2000) a wireless software start-up based in Atlanta,
which was acquired by Research in Motion in July 2002. Darrell began his career in semiconductors with
Texas Instruments, where he served as Product Manager for Speech Synthesis and Speech Recognition
Please provide a brief outline of Message Technologies’ focus and services.
   Message Technologies (MTI) was founded in 1982, and for the past 18 years we have been a leading
provider of hosted IVR solutions to both specialized SMB customers as well as the Fortune 500. Our goal is
to offer open, affordable, and reliable voice solutions that deliver maximum customer satisfaction. We take
two primary approaches to the IVR market. First, we have a partner model where companies that develop
speech applications for a particular market or vertical can bring their own applications to us for hosting on our
platform. The partner model allows our customers to leverage a fully redundant infrastructure and our best-of-
breed platform with minimal capital expenditure. Our second model allows MTI to turnkey self-service
applications from beginning to end. We employ senior speech professionals to build standards based
VoiceXML applications featuring full lifecycle design, development, and implementation as well as ongoing
tuning and maintenance. We’re headquartered in Atlanta, GA and we operate multiple carrier-grade data
center facilities in both Atlanta and Dallas, TX.
Do you see a growing adoption of hosted solutions in our current economic situation?
   Certainly the tough economy brings to light the numerous benefits that a hosted IVR solution brings to
enterprises. As a focused provider of hosted IVR solutions only, we are actively investing in the future of
hosted IVR. While the speech industry on the whole has remained relatively flat over the past several years,
we’ve seen a definitive upward trend in the number of enterprises that are moving away from premise-based
deployments and outsourcing their solutions. As more IVR equipment comes to end-of-life, enterprises are
increasingly deciding that they don’t want to repurchase new equipment and continue to manage it. We are
also seeing an uptick from international opportunities as the cost of call transport flattens through the use of
VoIP, and we have been able to spread our footprint into markets where our open approach to hosting is not
   In today’s market, the cost benefits of our hosted services have proven to be significant factors that have
driven many enterprises to MTI. Over the past year we have seen increased adoption of both inbound and
outbound IVR solutions. In the past, the amount of time and money that was necessary to build out a reliable
and scalable infrastructure proved to be a significant barrier to entry. When the barriers to entry are
eliminated, it’s an easy decision to opt for a solution with a company like MTI whose core competency is
hosted IVR. Because of our solid hosting platform, flexibility, and customer base, we’re well positioned to
benefit from this growing trend toward outsourcing.
Speech Strategy News                                      September 2009                                     32
What do you consider the particular strengths of Message Technologies?
   MTI understands every aspect of a complete hosted IVR business. After all, we have been developing and
hosting IVR solutions for over 18 years. We are unique in that our primary business and singular focus is
providing hosted speech IVR solutions. Our competitors—such as carriers or speech technology providers—
often have other primary revenue sources like network transport, proprietary platforms, product licenses, or
professional services that drive their decision-making and investment. Thus the speech IVR hosting capability
represents a smaller, adjunct business unit to their core business.
   In contrast, all of our revenue, investment, and expertise are related directly to delivering industry-leading
hosted IVR solutions. In terms of advantages, our focus on hosted IVR allows us to be more nimble, more
responsive, and more flexible in all areas of our business to meet the unique requirements of our customers.
Our flexibility is demonstrated in our willingness to allow customers to develop their own applications,
provide their own network transport, and utilize whichever CTI and back end data integration methods they
prefer. Our singular focus on delivering hosted speech IVR solutions has driven us to become an industry
leader. Our hosting platform is comprised of standards based technology from our partners Genesys
(VoiceXML media server) and Nuance (Speech/TTS)—both market share leaders—ensuring that we always
have the most reliable, feature-rich platform in the industry on which to develop and deploy solutions.
   Also, because we have grown organically since 1982 and managed our cost structure accordingly, we are
not beholden to venture capital or other external factors that would restrict our ability to be extremely cost
competitive while still providing extremely reliable IVR solutions with excellent contracted service level
Please give examples of some of the specific applications you deliver for customers.
  We currently support applications either directly or through partners for companies across nearly every
vertical including healthcare, retail, travel/hospitality, and financial services markets. Our partner network is
made up of companies with expertise in specific industries and they develop IVR applications within their
own specialty or vertical market. The proven capabilities of our hosting service deliver the reliability and
feature set that our partners demand, while still delivering the cost-efficiency businesses need.
I presume the underlying speech technologies are licensed. What can you say about this?
   Yes, as described earlier our infrastructure is made up of commercially available hardware and licensed
software. MTI has always believed that purchasing the best available technology in the IVR/Telephony space
is a differentiator and has allowed us to offer first-class service and support for our direct customers and our
partners. We add value by making these technologies easily, reliably, and cost-effectively available to our
Your “Natural Chat” option has been described as allowing any new or existing IVR application
to incorporate a “truly intelligent natural language option” (SSN, April 2009, p. 21)? What are
the strengths and limitations of this option? Can you say more about the underlying technology?
   Natural Chat is one of several new offerings that we are excited about. The IVR application captures the
spoken audio from the caller, which is then streamed in real-time via web service to a proprietary platform
that converts the audio to text in near real-time. The system then leverages the unique capabilities of an
interpretive engine that analyzes the text string using advanced artificial intelligence (AI) algorithms. The AI
component is critical to the solution because it has the ability to intelligently analyze the text output even if
the spoken input was not translated with 100% accuracy. Once the text has been analyzed, the result is
delivered to the caller using pre-recorded prompts or TTS (text-to -speech) capabilities resulting in a truly
interactive experience without the need to work through deep menu structures to deliver the same result.
   Because natural language understanding has long been the holy grail of the IVR world, we realized when
we introduced Natural Chat some degree of skepticism would likely occur. Fortunately, we have partnered
with several technology companies that are already market leaders in providing speech-to-text service and AI
decision making capabilities. When used together, speech-to-text and AI offer a smarter approach to
interactive speech. MTI’s goal is to offer this capability in an easily accessible on-demand service format.
Speech Strategy News                                       September 2009                                     33

                          Interview with Joe Alwan, BBN Technologies
             AVOKE Caller Experience Analytics is an analytics solution for call centers

  Joe Alwan, Vice President and General Manager of AVOKE Caller Experience Analytics, BBN
Technologies, was interviewed by Bill Meisel in late August. Joe was previously VP/GM of call center
solutions at Empirix, and has 25 years of experience turning technology innovation into practical business
BBN Technologies has a long history in speech technology. Can you summarize the companies
pioneering and current efforts in this area? [BBN was just awarded a contract for the next
phase of an automated translation program, the Global Autonomous Language Exploitation—
see p. 27. See also SSN, July 2009, p. 10, for another example of BBN’s broad efforts in speech
and language technology.]
   For nearly four decades, BBN has been a leader in speech and language technologies. Since the early
1970s, we’ve been performing pioneering research in automatic speech recognition. Over the years, BBN has
had many firsts, including the first demonstration, in the early 1990s, of real-time, large-vocabulary, speaker-
independent continuous speech recognition on commercial, off-the-shelf hardware.
   Byblos, our primary speech recognition system, is an automatically trainable system that utilizes
probabilistic hidden Markov models, and it continues to represent the state of the art in large-vocabulary,
speaker-independent speech recognition.
   The Byblos engine forms the core of our speech application portfolio that includes two-way translation,
AVOKE Caller Experience Analytics, and our MultiMedia Monitoring System. Current research programs
continue to deliver significant improvements in recognition accuracy in different environments, including
telephony and broadcast news, and in multiple languages, including English, Arabic, Mandarin, and Spanish.
   Our natural language processing technologies can locate, identify, and organize information from a variety
of sources and in multiple languages. OnTopic and Unsupervised Topic Discovery, for example, can take
output from Byblos to discover and locate topics in audio content.
BBN has both government research contracts and commercial offerings. BBN’s speech
analytics solution is an example of a commercial offering. Please describe its functionality.
   We don’t view speech analytics as a solution by itself. It’s really a technology that can address a variety of
business needs. Our solution for call centers – AVOKE Caller Experience Analytics – was built to address the
unique needs of large complex operations. These organizations have discovered that they can’t achieve
customer satisfaction and budget goals by just managing agent performance.
   Our customers need a solution that covers three blind spots. They have poor reporting and no direct
visibility of the caller’s experience: (a) in the IVR, (b) through transfers, or (c) with partners. They have good
quality management for their own agents, but they have no holistic view of the customer’s interaction from
dialing all the way to hang-up. They can manage individual agent behavior, but they can’t optimize how they
use both agents and automation together to deliver streamlined end-to-end interactions.
   For many companies, these blind spots account for 40+% of telephone brand experiences and 15+% of
customer interaction costs. In this day and age, who can afford to ignore half of all telephone brand
experience? Not to mention the operational and strategic business intelligence trapped in customer calls, and
the opportunity to reduce customer contact costs by 10-30%?
   To address these needs, BBN developed an analysis methodology, and technology to support it. The
AVOKE Methodology defines metrics for contact effectiveness from the customer’s perspective and
measurements for avoidable call volume and wasted agent talk time.
   The AVOKE Call Browser system captures data for these new metrics and supports collaborative and
iterative analysis. It has six major areas of functionality:
 True whole call recording: One continuous recording starting in the IVR including all transfers to agents,
     sites and partners.
 IVR Analytics: Automated analysis of caller’s path through the IVR navigation.
 Speech Analytics: Searchable full text transcript of the entire call.
 Automatic Categorization: Using IVR and speech analytics information.
Speech Strategy News                                      September 2009                                    34
   Multi-Dimensional Database: For ad hoc charting and analysis.
   Web Collaboration: To easily engage all stakeholders.
  The AVOKE Call Browser provides all this functionality in the “Software-As-A-Service” (SAAS) or
“Cloud Computing” model. No new hardware, software or customized integration links need to be installed at
any center site or partner location. The AVOKE Call Browser system is transparent to both the customer and
call center operations.
How is AVOKE marketed and sold?
   The AVOKE solution has been selected by every major telecom and cable company, by the top personal
computer companies, by nine utilities, and by leaders in financial services, healthcare, and travel. The
AVOKE solution is sold and marketed directly to end-user customers by BBN, and through partners such as
Nuance and Verizon.
   The SAAS model also enables innovative new pricing and services. The solution is offered in a
subscription model – eliminating the large upfront cost typical of enterprise software. We also provide an
affordable pilot program and a range of training and services options. Professional services from BBN or
partners can address specific issues, such as reducing transfers or increasing self-service. Customers can also
select services to augment or provide a dedicated caller experience analysis team.
Is speech analytics a mature technology or do you see continuing advances?
   Speech analytics is still very young. There will be continuing advances in the underlying technology, the
architecture of commercial solutions, and the applications of speech analytics to address specific business
needs. And BBN will continue to be at the forefront of new developments. In fact, BBN has one of the largest
speech and language research and development teams in North America – with $44 million in awarded
research funding already in 2009.
Any final comments?
   Thank you for the opportunity to share what we’re doing. And, we look forward to continuing to work
closely with both the research and commercial communities to develop the full potential of speech and
language technologies.

VUI Visions
                          Multimodal Customer Service Transactions
                                  Matt Yuschik, Convergys Corporation

   In this guest column, we ask designers skilled in creating Voice User Interfaces to highlight a particular
aspect of VUI design inspired by actual deployments. In this issue, Matt Yuschik, Ph.D., Human Factors
Specialist, Multichannel Self Care Solutions, Relationship Technology Management, Convergys Corporation
(p. 8), discusses how multimodal interactions can extend the voice interface into a different form of “natural
language” interaction, where skilled agents can test the interaction before expecting the customer to do it.
Dr. Yuschik designs and evaluates multimodal user interfaces for call center agents and for customer self-
care devices. Matt previously designed and brought a voice-activated voice mail product to the market in
Europe and the US. His designs are intuitive and easy-to-use, based on task analysis and turn-taking
principles of human behavior. Matt has numerous patents and publications in the field of speech technology,
voice activation, and multimodal interfaces.
     There is a compelling need to identify and develop viable multimodal self-service transactions for
customers who call for service. One way is to migrate them from agent-facilitated call center transactions to
easy-to-use end-user applications. First, we must address the behavior of humans in interactions with
customer service centers, and track how it has changed. Then, it’s important to observe that call center agents,
subject matter experts in their own right, who narrow a customer’s problem to a specific solution by using a
set of rich tools. Convergys values our agents as problem-solving “solutioners,” and draw upon their
Speech Strategy News                                       September 2009                                     35
experience to help design and test multimodal services for customers. The goal is to enable customers to
complete their own transactions with their own multimodal devices and be satisfied with the result.

User Interface Versions
     To get an appreciation of where multimodal transactions fit in the spectra of voice-enabled services, here
is a narrow perspective on the history and evolution of Human Interactions related to call centers:
 Version 0: All calls go to live agents. This is the original call center procedure when a company handled
    every call personally. The agents had well-rehearsed scripts which they followed.
 Version 1: DTMF, in the form “Press 1 for X”, gives callers a way to take action into their own hands to
    resolve common and simple issues. Menus started from agent scripts, and choices were grouped in a tree
    structure to navigate using the buttons of the telephone keypad. These transactions were simple for
    programmers to implement. Agents were still available to handle difficult or uncommon issues.
 Version 2: “Voicify” the DTMF prompts to the form “Press or say 1 for X”. This is the first foray of speech
    into the dialog, but was a voice overlay onto Version 1. It strongly coupled the transaction to the DTMF
    menu with the hope that a correct option would be heard.
 Version 3: Initiate a Directed Dialog, e.g., “Please say listen, send, or mailbox options.” This removes the
    requirement for mental mapping of numbers to choices, and leverages a VUI designer’s role to present
    logical choices that support a flexible problem-solving process. Only the most common use cases are
    supported, with the agents handling other cases.
 Version 4: Use Natural Language to answer an open-ended question, like, “What would you like to do?”
    This drives the transaction by the user’s view versus the view imposed by the computer. Often, the dialog
    returns to Version 3 to provide some options and guidance to sustain the dialog. The risk is that the caller’s
    intention is not understood or able to be resolved. The agent, again, comes to the rescue.
 Version 5: A Multimodal approach that visually displays data and options, and verbally asks, “What next?”
    Generally, voice is used for input, and text / graphics for output. This user has complete control, though
    the system provides a number of ways for the user to specify their issue, even starting with vague terms
    and then leading to a refinement of their specific issue.
    The Versions show an evolution from the agent’s view to the user’s view, from a highly structured
approach to an open-ended one—which can fall back to a somewhat limited structure that helpfully narrows
the focus. A balance lets the caller have flexibility to start the conversation in any way, and lets the computer
suggest options when the caller hesitates, or data is needed to fill a form to isolate the concern. A back-and-
forth dialog is maintained until all required info is obtained and closure of the issue can be achieved. This
approach follows a transaction flow driven by the caller yet guided by the agent. The interaction is more
dynamic (more turn-taking) and less static (menu-driven).

Call Center Transactions
    Convergys has about 85 domestic and international call centers that provide a rich opportunity to monitor
agents’ and callers’ interactions. These observations enable defining a model of agent behavior: a
“solutioning” stage that isolates the key issue; an information gathering part with “data entry” and navigation
through multiple agent screens; and, a resolution and closure part to agree upon the solution. Some dialog
may occur that is not relevant to the issue, but can consist of social conversation to increase caller comfort,
acquiring optional background information, or even be apologetic to de-fuse an angry caller.

The Transaction Flow
    Convergys strives to improve self-service and increase customer satisfaction. It does this by observing
agents handling their callers, where flow and pace of the dialog can be clearly distinguished. It is also
possible to aid this interaction by placing a multimodal workstation into this environment, agents now have
the flexibility to use the flow and pace of their voice to navigate and populate a graphic interface. The agent
must still converse to extract sufficient information to complete certain screen-based tasks. The flow is a
valuable method for agents to leverage with a multimodal UI. Certain tasks work better in one modality than
another—speech is excellent for navigation, graphics for presenting data from searches. Verbal shortcuts
during the solutioning step enable the agent to say a key phrase that jumps to the screens where data is
Speech Strategy News                                       September 2009                                      36
required. Additionally, in gathering information, data may be obtained early in the conversation and stored in
a speech-enabled short-term memory until the information is placed in the necessary location in the GUI.
     Transaction-specific flows can be developed which emulate the steps taken by the agents, with the goal
that eventually the callers can perform the transaction by themselves on hand-held devices. The flows are
tested by the agents on their multimodal workstations, which includes support for backup and error-handling
should the solution veer off-track and require further redirection. Only then is the transaction considered
robust enough for a smart handset.

Safety Net
    Once an automated multimodal version of a service is constructed, it is deployed in a limited pilot test
with friendly users (generally, agents who use the device yet can fall-back to their existing workstation), who
are tasked to resolve a rich set of use cases. Usage patterns and results are monitored to identify and address
any unanticipated pain points. While the overall goal is to contain all calls, not all problems can be covered
by automation, so effective intervention by an agent must be provided. A hidden agent procedure occurs
when the caller starts an issue, but an implicit monitoring mechanism (driven by business decision rules)
infers that the caller is having difficulty. An agent is bridged onto the call without the knowledge of the caller.
The agent has transaction history and context, and can listen to the caller’s speech to move the transaction
forward “behind the scenes.” The agent intervenes only when needed, which may be for simple situations
where the caller is difficult to understand, or conditions beyond the capability of the automated solution, or
when the caller’s emotional state must be defused before a solution is attempted.
    One potential limitation of multimodal services is the ability of current handheld devices to support the
services. This is a valid concern, but the advent of more 3G phones and open API software enables
multimodal applications to be more pervasive. Current technology encourages the use of the devices for more
complex tasks, especially multimodal services.

Future Work
    Ongoing and future work at Convergys focuses on business sectors and use cases which are amenable to a
rich multimodal environment with accommodating users. In the Telecom sector, calls are received from
customers experiencing service-impacting conditions. Whether it be the need to troubleshoot a set-top box or
to download ringtones, the ability to show a video clip which coaches the caller through steps until problem
resolution has a large advantage over talking the caller through a sequence of potentially confusing steps. In
the Sales sector, a retailer can display a visual of the products (clothes, rental car models) thereby putting
Internet capabilities on a mobile device. Designing, prototyping, and trialing these applications is a rich
opportunity to identify those tasks and transactions suitable for migration to the ever increasing number of 3G
intelligent telephones. The ability of Convergys to use call center agents familiar with these transactions and
willing to test alternative multimodal environments to solve caller problems is a very rewarding opportunity.
Convergys is in a position to provide multimodal applications with cutting edge technology to meet the
behavioral habits of an increasingly technology-driven culture.

             Speech analytics (cont.)                      a partnership, whereby ComputerTel will offer Aurix
                                                           phonetic audio search engine technology (SSN, May
                              Continued from page 1        2009, p. 16) in a product for contact centers to
                                                           record, monitor, and perform speech analytics on call
   In this issue, there is specific coverage of speech     recordings.
analytics announcements by CallMiner (with                    Speech analytics is often combined with other
Aspect), p. 15; BBN interview, p. 33; CallCopy, p.         analytic tools that measure contact center
16; DSS, p. 18; Nexidia, p. 14; Nice Systems, p. 16;       performance, e.g., statistics on how many calls that
OnviSource (using Aurix speech technology), p. 17;         are automated versus failures to automate that result
Utopy, p. 17; VoiceVault, p. 12, and Voxeo, p. 18.         in a transfer to an agent. Beyond call center sources,
In addition, recent interviews at SpeechTEK are the        one can also gain insights into customer behavior
source of the notes on the companies in the                through tools such as surveys. Companies with the
remainder of this article; see notes on Autonomy,          broadest focus are in what DMG Consulting
Nuance, Verint, and West Interactive in this               considers the “Contact Center Surveying/Feedback
article. ComputerTel and Aurix recently announced          and Analytics Market”; only Verint of the companies
Speech Strategy News                                       September 2009                                    37
mentioned so far in this article was included in           Autonomy. Virage SoftSound delivers audio
DMG’s recent report on that market. The report’s           processing applications to enable live or recorded
abstract briefly defined the companies evaluated as        speech to be manipulated, edited, searched, and
“vendors     that    offer   full      contact    center   hyperlinked as easily as text. This is achieved with a
surveying/feedback and analytics solutions”;               wide range of speech processing technologies from
RightNow Technologies is an example of a                   audio segmentation and identification to speech
company included in DMG’s market share                     recognition and understanding.
evaluations, but not in this issue of SSN. DMG
concluded that the surveying/feedback market grew          Nuance Communications
by 18% between 2008 and 2009. The report indicates            Nuance doesn’t currently offer a speech analytics
growth in this market for a reason that is applicable      product, although a company spokesperson said a
to speech analytics: The current recession has             future product is a possibility. The spokesperson
actually spurred growth because enterprises need to        indicated that the company’s speech-to-text
understand their customers in order to retain and          technology is used by some companies in their
continue to sell to them, and addressing customer          speech analytics offerings. Nuance partners with
issues can reduce the number of calls to customer          Nexidia to offer speech analytics in their
service numbers, reducing costs.                           professional services “Nuance Care Analytics”
   Similarly, ClickFox doesn’t offer speech                offering. Nuance Care Analytics goes beyond speech
analytics, but Anna Convery, the company’s Chief           analytics and provides customers with data-driven
Marketing Officer, notes that a broader view of            recommendations to improve their primary business
analytics should track the customer across mediums,        problems including self service automation, caller
e.g., from a Web site that causes the customer to          experience, revenue generation, and misdirected
phone customer service. ClickFox specializes in            calls.
providing this overall view of a customer experience,         Nuance’s Lauren Hodgson gave an overview of
and can extend the utility of speech analytics. More       possible types of consulting services and some
generally, a number of the vendors cited don’t view        estimates of costs in a company blog. A snapshot of
speech analytics as an end in itself, and try to provide   contact center data (typically a week or two worth
a more holistic view.                                      of data) can be analyzed in 6-12 weeks for about
                                                           $75K-$150K, including tools and consulting
Autonomy                                                   services. A “single-cycle” does everything that a
   In a keynote speech at SpeechTEK on August 24,          snapshot approach does, but also includes another
Steven Graff, vice president of technology and chief       analysis after you make improvements. The cost is
architect, Autonomy, gave a talk on “Bringing              typically     $150K-$200K.       A      “continuous
Meaning and Value to Enterprise Search.” He noted          improvement” assignment includes a series of
that harnessing voice information across the               single-cycle improvements, typically a one-year
enteprise can impact customer service, gather              contract to deliver about a cycle a quarter, with a
intelligence, determine business strategy, and             ballpark price of $400K-$1M.
minimize risk using meaning and context. Autonomy
offers solutions to companies that are concerned           West Interactive
about managing the exploding amount of                       West Interactive provides hosted speech
information that is generated by email, instant            solutions (interview, SSN, June 2009, p. 29). Mike
messaging, reports, presentations, videos, contact         Moore, Analytics Manager, West Interactive,
center recordings, and all the ways that companies         indicated that the company’s professional services
document themselves internally and externally. (See        organization can perform speech analytics (and
company update, SSN, January 2009, p. 1.)                  broader analysis) on contact center interactions.
   At the heart of Autonomy's infrastructure software
lies the Intelligent Data Operating Layer (IDOL)           Verint
Server. The IDOL Server collects indexed data from
                                                             Verint Systems Inc. offers Impact 360 Speech
connectors and stores it in its proprietary structure,
                                                           Analytics solutions, software from the company’s
optimized for fast processing and retrieval of data.
                                                           Verint Witness Actionable Solutions division (SSN,
   Autonomy’s Virage SoftSound was founded in
                                                           June 2009, p. 14). Diego Lomanto, product
1995 and is backed by over ten years of research
                                                           marketing manager, Verint Witness Actionable
from Cambridge University. In May 2000 Virage
                                                           Solutions, said the company incorporates licensed
SoftSound received substantial investment from
Speech Strategy News                                     September 2009                                     38
large-vocabulary speech recognition from an outside      about an error in reporting amounts paid, leading to
vendor in its solution. The Verint software can track    long calls that required agents (and a long interaction
changes over time. Lomanto gave the example of the       while the agent corrected the problem). The error
phrases “new fees” suddenly appearing, suggesting        occurred when the customers made the payment to a
that something about new fees is driving calls to the    teller, caused by the mortgage payment form having
contact center. As with most speech analytics,           entries that didn’t match the teller’s computer screen,
specific calls with this phrase can be reviewed to       a discrepancy that was corrected.
understand the problem and hopefully correct it.            Lomanto also indicated that Verint’s broader
   In a real case, Lomanto said that a wireless          technology was capable of detecting that a customer
company with high call volumes saved millions of         had been on the web just prior to a customer service
dollars by detecting the confluence of “new phone”       call. In one case, the company estimated that it
and “late” in a clustering analysis. The company         expended $228,000 in processing calls due to a
realized that agents were promising delivery of a        difficult password reset task on the Web that the IT
new phone model that was ordered with an                 department was able to correct.
unrealistic delivery estimate, creating the calls. The      Verint sells an Impact 360 Speech Analytics
problem was remedied by providing a more                 Essentials version that is affordable to smaller call
conservative delivery estimate.                          centers. An advanced version can handle centers
   Sometimes the length of the call is revealing,        with large numbers of agents and adds some other
Lomanto said. In one case, a bank discovered that        features.
customers with home mortgages were complaining

          Avaya and Loquendo (cont.)                     customer care across self and assisted service, and
                                                         Proactive Outreach, the multi-channel outbound self-
                              Continued from page 1      service solution.
    Loquendo is already a participant in the Avaya          Avaya and Loquendo technologies have already
DevConnect program. As a DevConnect member,              been used together worldwide to develop customer
the Loquendo MRCP Server is certified to work with       help desks, self-service banking applications, railway
the Avaya Voice Portal, the company's flagship           timetable enquiry services, and football match
speech self-service solution for contact centers, as     information and ticketing. A company that currently
well as with Avaya Interactive Response. Loquendo        uses a solution based on Avaya and Loquendo is
MRCP Server is a server-based solution for large-        VIVA, a Bolivian mobile service provider. VIVA
scale deployments of speech technologies in              implemented an Avaya Voice Portal and Loquendo
telephony environments, including contact centers.       text-to-speech solution in its contact center.
(Avaya recently added Swampfox Technologies to
its DevConnect program—see end of this article.)         Swampfox
   Avaya sells premise-based solutions, serving
                                                            Avaya sales are about 30% direct and 70%
customers that want a hosted solution through
                                                         through partners, Perry estimated. Thus, companies
partners. Michael Perry, director, product
                                                         such as Swampfox Technologies, an August addition
management, Avaya, said in an August interview
                                                         to the Avaya DevConnect program with Platinum
that, while the company’s shipments of contact
                                                         Level membership, are important to the company’s
center solutions are increasing despite the recession,
                                                         business strategy. Founded by former Avaya Voice
the portion shipped with speech recognition is
                                                         Portal and one-X Speech engineers and architects,
declining substantially. Apparently companies see
                                                         Swampfox focuses on Avaya’s Voice Portal and
the need for automation to reduce agent costs, but
                                                         Unified Communications product lines. The
don’t see the additional cost of speech technologies
                                                         company builds pre-packaged value-added solutions,
justifying the return over touch-tone solutions. The
                                                         as well as offers its services to Avaya business
Loquendo option will allow Avaya to offer a lower-
                                                         partners, Avaya direct sales staff, and other
cost speech option to customers.
                                                         DevConnect members who want to leverage the
   In addition to Avaya Voice Portal, Loquendo
                                                         Voice Portal platform.
speech technologies can now be integrated with other
Avaya contact center solutions, such as Intelligent
Customer Routing, which unifies delivery of
Speech Strategy News                                     September 2009                                    39

                  Survey (cont.)                              “The research is confirming what we believed
                                                           would happen as people more widely use
                            Continued from page 1          smartphones to multitask while on the go, away
   The research shows that 75% of people would             from the home or office,” said Dariusz Paczuski,
choose a smartphone that allows them to compose            senior director of Tellme Mobile Speech. “Our
a text message, search the Web, or dial a contact          'say what you want and get it' voice products and
simply by speaking, rather than by typing or using         services are making it easier to get more done with
a touch screen. An overwhelming majority of                your phone no matter where you are or what
respondents said they would feel comfortable               you're doing.”
using voice to perform tasks by voice in places
such as a restaurants and gyms. A perhaps                  Tellme “say what you want” services
surprising 71% said they would feel just fine using
                                                              As an example, Brooks Crichlow, director,
speech input with their smartphone at a restaurant.
                                                           enterprise marketing, at Tellme noted that is
An overwhelming majority of respondents said
                                                           already integrated into the Ford Sync providing
they would feel comfortable using voice to
                                                           network-based services; in a panel at SpeechTEK,
perform tasks on their smartphones while walking
                                                           a Ford executive said that the same model sold
(93%), exercising (92%), and shopping or running
                                                           twice as many units with Sync than without, and
errands (87%).
                                                           that the company considered it an unqualified
   Most people use smartphones while conducting
                                                           success. Service Delivery Network, the in-vehicle
other tasks in order to make better use of their
                                                           communications and entertainment system
time. Those surveyed say they use their
                                                           developed by Ford and Microsoft includes using
smartphones while shopping or running errands
                                                           voice when selecting music; making hands-free
(88%), waiting at appointments (80%), walking
                                                           phone calls; and getting traffic, directions and
between places (78%), visiting friends (68%), and
                                                           information. Tellme previously announced the a
in many other places, such as while eating at
                                                           mobile voice service that will combine content and
restaurants, commuting, exercising, or attending
                                                           communications, due on Windows Mobile 6.5
school. The trend toward consumers dropping
                                                           phones this fall (SSN, May 2009, p. 1). In
landline phones (p. 43) accelerates a cultural shift
                                                           addition, Microsoft’s 800-BING-411 directory
in the way we view telephony.
                                                           assistance and information service is hosted by
   While typing and touching are not perceived as
                                                           Tellme (p. 1).
difficult, respondents acknowledge that using their
                                                              Grant Shirk, director of industry solutions,
smartphones in these situations can be distracting.
                                                           Tellme, noted that the companies hosted customer
If given the option to simply push a button and
                                                           services for enterprises such as American
speak in order to call or text a friend or search for
                                                           Airlines can use multiple speech technologies,
information, such as the location of a restaurant,
                                                           including Nuance, Microsoft, and IBM, but is
directions or stock quotes, most say they could
                                                           working toward increased use of Microsoft
accomplish more and feel less distracted than if
                                                           technology. E*Trade is one customer already
they were using text or touch input.
                                                           based     on   Microsoft     speech     recognition
   Anne Truscott, brand strategist at Sanderson
                                                           technology. In May, Microsoft signaled the
Studios, noted that “using your voice while
                                                           importance of speech technology to the company
walking or checking out is like walking and
                                                           by integrating all its speech resources (including
chewing gum at the same time; it just comes
                                                           Tellme and research groups in the home office,
naturally.” She said, “We were surprised how
                                                           Portugal, and Beijing) under a single manager, Zig
many people said they'd feel comfortable using
                                                           Serifin,    who      ran    Microsoft’s     Unified
their voices to interact with their smartphones
                                                           Communications group.
while in public places as well.”

                 Volt Delta (cont.)                      Volt Delta, supporting call routing for organizations
                                                         such as Tellme, a Microsoft subsidiary that handles
                              Continued from page 1      1-800-BING-411 (p. 6). The company’s platform
   The company’s background in directory                 works with any service provider and currently
assistance for service providers gives it an             handles about 2.4 billion calls per year and 2 billion
infrastructure that can handle large volumes of calls,   SMS messages. The company has hosting operations
notes Steve Chirokas, executive director, marketing,     in New York, California, the UK, and Germany. The
Speech Strategy News                                       September 2009                                     40
company also has “packaged” applications that can          “just a minute,” when asked for desired flight, or say
be deployed quickly, including a survey service.           “from Los Angeles to New York.” The caller may
When the call is transferred to an agent because           have also specified other information that narrows
automation fails, Chirokas noted, a “whisper” feature      the interpretation in a previous response, e.g., a city
lets the agent hear what the caller said, avoiding the     before giving a street address. By picking out the
most common complaint of callers—having to repeat          words from the recognizer, including those that
themselves.                                                weren’t the top choice (e.g., using N-best recognizer
   A recent innovation that the company hasn’t             output), one may be able to match a unique entry
widely publicized is its CrystalWave speech                from the database given the supporting information.
recognition technology, as described in an interview          An additional innovation, Bielby said, is the
with Speech Strategy News with three executives.           creation of a dynamic grammar that includes a
CrystalWave is a processing technique that can             number of possible matches after the analysis of the
consolidate the results of multiple speech recognition     first utterance—a “re-recognition grammar.” That
engines running in parallel. The core speech               grammar can be applied to the same utterance
technology can be from any source (currently the           without re-prompting the user, and, if a match is
company uses Nuance). For example, the two                 found, the customer’s request fulfilled.
recognizers can be one powered by a defined                   Todd Schmeer, director of speech application
grammar and one by a Statistical Language Model.           services, Volt Delta, noted that the company’s in-
The innovation is in deciding the correct                  depth work with directory assistance, one of the most
interpretation of possibly different results, notes        demanding speech applications, had led to the
Greg Bielby, manager, directory assistance                 insights that drove the company’s innovation.
automation, Volt Delta. In addition, the use of two        Another demanding application that the company
recognizers can allow proper handling of content that      recently fielded, he said, was a parking lot
was not anticipated.                                       application where customers reported their location
   When the application includes accessing a               by mobile phone using alphanumerics. Recognizing
database such as those in directory assistance or          spoken letters and numbers is challenging, and the
airline reservations applications, there are often parts   company’s success in this application suggests the
of a customer’s response that are not anticipated in a     utility of its speech recognition innovations.
grammar. For example, a customer might say
extraneous extra phrases such as “I’m looking for” or

                                          News Briefs
Yucheng Partners with Convergys to expand the contact center product & services market
in China
    Convergys Corporation announced a new partnership with Yucheng Technologies Limited, a China-
based IT solutions provider to the Chinese banking industry, that will enable it to sell Convergys’ Intervoice
Edify Voice Interaction Platform (EVIP) and Convergys Dynamic Decisioning Solution in China. Yucheng
has deployed call center solutions for three of the top five banks in China and had a 14.6% market share as of
2008, according to consulting firm IDC. 
Convergys’ EVIP technology includes text-to-speech capability. 

Gold Systems’ new conferencing system uses Microsoft OCS 2007 R2
    Gold Systems offers a suite of voice-powered products, designed to enhance security and productivity,
which work with the speech recognition technology built into the Microsoft Office Communications Server
(OCS) platform (SSN, August 2008, p. 12). In August, Gold Systems announced a new conferencing system
using Microsoft Office Communications Server 2007 R2.
Speech Strategy News                                      September 2009                                     41
Eckoh wins a £1.5 million contract to provide speech recognition services for a major UK
transport organization
    Eckoh, a UK provider of hosted speech recognition services, announced that it has won a new five-year
contract worth a minimum value of £1.5 million. The contract is to provide automated contact center services
using Eckoh’s advanced speech recognition technology on behalf of a major government transport and
infrastructure organization. The first service is expected to launch later this year. In the travel sector, Eckoh
operates “on-demand” services that currently take a average of 700,000 calls every month.

Micromation integrates call center solution for Health Dialog using Loquendo speech
   In August, MicroAutomation, an integrator of call center automation solutions (SSN, February 2009, p.
9), announced that they are working with Health Dialog to implement Loquendo speech solutions for Health
Dialog’s internally-operated speech recognition text-to-speech applications in North America. Health Dialog,
a wholly-owned subsidiary of Bupa, a global provider of healthcare services, is a major provider of health
coaching and related support.

VoiceVerified is now CSIdentity, offers multi-layer security model with speech authentication
    CSIdentity bought VoiceVerified, a supplier of speaker verification solutions (SSN, August 2009, p. 22).
CSIdentity’s multi-layer security model consists of technology in identity verification and authentication
combined with VoiceVerified speaker authentication. Bill Morrow, CEO
 CSIdentity, said, “Almost every
business, institution or government agency that has to verify identity needs CSIdentity’s VoiceVerified. We
offer the ideal solution for securing online data, call center transactions, or mobile commerce connections for
employees, customers, students, and remote workers.”

IntelePeer and Transera team to offer on-demand contact center solutions
    IntelePeer provides hosted on-demand rich media communications that enable carriers, businesses, and
software vendors to deliver voice and multimedia capabilities to any phone or network-connected device,
with some services using text-to-speech technology (SSN, November 2008, p. 34). Transera
Communications offers an on-demand virtual contact center solution that can support agents located
anywhere in the world. The two companies announced a partnership that combines Transera’s Seratel on-
demand contact center software with IntelePeer’s global carrier-grade infrastructure and voice and rich media

Home emergency insurance and repair service deploys contact center systems using Sabio
    HomeServe, a home emergency insurance and repair service provider, successfully deployed a series of
different contact center systems and applications. HomeServe and Sabio, a contact centre services and
solutions company, jointly detailed those contact centre technology solutions at a conference.

British show tests speech recognition customer service and finds it works
    In August, BBC One’s early evening daily magazine program, “The One Show,” reported on voice self-
service systems using the National Rail Enquiries Train Tracker service and concluded that “speech
recognition systems do work and are here to stay.”

Spanlink to resell Interactive Intelligence unified IP business communications solutions
    Interactive Intelligence and Spanlink Communications have signed a nationwide reseller agreement.
The agreement authorizes Spanlink to sell, deploy and support Interactive Intelligence unified IP business
communications solutions. Spanlink is a provider of contact center and unified communications solutions that
leverage VoIP technology. 
Interactive Intelligence provides unified business communications solutions for
contact center automation, enterprise IP telephony, and business process automation (SSN, July 2009, p. 36
and 37).
Speech Strategy News                                     September 2009                                    42
Voice Web Solutions’ Grammar Studio tool for visually creating and deploying speech
    Voice Web Solutions’ Grammar Studio is a developer tool for visually creating and deploying speech
grammar formats for VoiceXML telephony and multimodal SALT and X+V applications. Grammar Studio is
a developer tool for visually creating and deploying speech grammar formats for VoiceXML telephony,
multimodal SALT and X+V applications. Grammar Studio can work with new or existing W3C SRGS
grammar files regardless of the underlying platform. The software is free to try; $99.95 to buy.

Genesys IVR platform available from LBi Software Engineering
    LBi Software Engineering, which integrates solutions for enterprises, focusing on “Human Capital
Management,” announced it will be offering an IVR Platform from Genesys Telecommunications
Laboratories (SSN, July 2009, p. 13). Richard Teed, president of LBi, said, “Having implemented dozens of
successful IVR applications, many with CTI capability, we are now in a position to provide a total solution
with seamless integration for our clients.”

 LBi can now deliver an IVR platform with either a packaged IVR
application or a complete customized solution. Speech recognition capabilities can be built into the user
interface, with a speaker verification option for added fraud detection.

Customers are changing with new social norms, notes Paul Greenberg in SpeechTEK
     Paul Greenberg in a keynote address at SpeechTek, “Voice of the (Social) Customer,” noted that the
nature of customers has changed, and companies need to be cognizant of these new “social customers” by
properly using social media. Greenberg noted that a survey shows that the number of potential customers that
would trust “someone like me” over an expert’s opinion almost tripled to 60% over about five years, perhaps
reflecting the increasing use of social media to rank and comment on everything from products to movies.
     According to Greenberg, social customers want accelerated and enhanced interactions and expect
institutions to respond to them via the communication channel of their choice. Greenberg said that sales,
marketing, and customer services have been the three historic pillars of CRM. But he noted that while sales
used to be the driver, now customer service is becoming the driving force.

Dimension Data speech self-service survey suggests that customers resent speech
automation in part because, when it fails, they have to start over with an agent
    Martin C. Dove, managing director of Global Customer Interactive Solutions, Dimension Data (an IT
solutions and services provider) reported on a study, “The Alignment Index,” at SpeechTEK 2009, August
24-26, 2009. The study tested the consistency of views between customers calling contact centers, managers
of contact centers, and vendors. Among many results, he reported that a main reason for dissatisfaction with
automation at contact centers is the typical requirement to repeat data already entered in the automated system
when eventually connected to an agent. Apparently, it wasn’t the automation callers resented, but that it was a
waste of time when it failed to completely resolve their problem.

ClickFox and Business Systems partner to bring customer experience analytics to EMEA
    ClickFox (SSN, May 2009, p. 19) and Business Systems, an independent contact center specialist,
announced a partnership agreement and plans to market ClickFox’s analytics software in Business Systems’
suite of technology solutions. The companies are teaming up to address increased demand for customer
experience analytics in Europe, the Middle East, and Africa (EMEA).

Servion offers Customer Interaction Systems supporting multiple channels, including voice
    Servion Global Solutions specializes in Customer Interaction Management (CIM) solutions. At
SpeechTek in August, Servion showcased various speech capabilities intended to provide a positive caller
experience while supporting a high call automation rate. The Servion soluton uses a single code base that can
cater to requests from multiple channels like voice, web and chat simultaneously. It can respond accordingly
using voice, HTML, or text.
Speech Strategy News                                     September 2009                                   43
Google Voice indexes using text tags and plays podcasts and web audio
     The experimental Google Listen from Google Labs brings podcasts and web audio to Android-powered
devices. It lets you search, subscribe, download. and stream. By subscribing to programs and search terms it
will create a personalized audio-magazine loaded with fresh shows and news stories whenever you listen. In
this release, Listen is indexing thousands of popular English-only audio sources. It apparently uses the text
information available for the audio source rather than searching audio content using speech recognition.

US consumers continue dropping landlines for mobile phones
    The Economist, in its August 13, 2009, edition, reports that US telecom operators are seeing customers
abandon landlines at a rate of 700,000 per month, with about 25% of households in America now relying
entirely on mobile phones—a share that could double within the next three years, according to the article.
Without a change, there could be no more consumer landlines in 2025. The consequences could have major
impacts on businesses that require landlines, on emergency services, on pollsters, and of course the landline
providers. The Economist warns of an impending crisis, where regulators may have to “decide whether to
subsidize or bail out landline firms,” noting that unfunded pension liabilities of AT&T and Verizon are as big
as those of General Motors before its recent bankruptcy. The article even raises the specter of taxes on
wireless phones to support wireline service.
    From a different point of view, this dependence on mobile phones should require more of the voice-
enabled services that would make such a shift easier to deal with by consumers. As a simple example, it is
harder to make a written note of a phone number in a mobile call when the phone could be anywhere—it’s
not uncommon to have a pad of paper (or a PC) next to a fixed landline phone. Voice notes or voicemail-to-
text helps solve this problem. A recent survey showed that mobile phones are often used in much different
environments than landline phones (p. 1).

CNET’s choices for top BlackBerry apps favor speech technology
     A CNET reviewer, Jessica Dolcourt, named the top seven business apps for BlackBerry in an August
posting. One was email dictation from MyCaption. The email dictation software uses a combination of
speech recognition and human transcription. One can dictate email, a memo, a task, or calendar item. (Email
messages are limited to three minutes of speech.) The $9.99 App World download fee includes 20 minutes of
talk time. There is a subscription model for frequent users. MyCaption announced in May that they are using
Yap in their business applications for BlackBerry smartphones and PBX Voicemail (SSN, August 2009, p.
     A second recommendation out of the seven was YouMail Visual Voicemail Plus, which treats voicemail
like email, listing callers and messages so you can play them back in any order. CallWave (SSN, February
2008, p. 12) and YouMail’s premium transcription service add the feature of transcribing incoming voicemail
messages into text. CallWave starts at $15 per month for its transcription service. YouMail ranges from $4 to
$7 per month. YouMail is also using Yap technology (SSN, May 2009, p. 6).

SpinVox makes its API available in Italian and Portuguese
    SpinVox, a service providing voice-to-text messaging, is releasing two new language versions of its open
Applications Program Interface that allows other software and services to incorporate speech-to-text
functionality (SSN, August 2009, p. 18). SpinVox AP in Italian and Portuguese are being launched in
response to an increasing demand from technology developers particularly in Italy, Portugal, and Brazil. The
SpinVox API is already available in English, Spanish, German, and French.

Rogers Wireless expands use of SpinVox voicemail-to-text service
    SpinVox Voicemail-to-Text (SSN, August 2009, p. 18) is now a standard feature in Rogers Wireless
SmartPhone Data Value Pack and BlackBerry Messaging Value Pack wireless rate plans. SpinVox is now a
standard feature in bundling plans with Canadian service providers Rogers, SaskTel, and TELUS, making
the service widely available and easily affordable to most Canadian customers.
Speech Strategy News                                     September 2009                                   44
GM Voices develops international voice personalities for Hewlett-Packard global operations
    GM Voices produces prerecorded voice prompts and greetings for telecom applications worldwide. The
company announced the successful deployment of 25 international language voice recordings for Hewlett
Packard’s call centers. Additional languages are currently in development. “GM Voices provides translation
and locally authentic voices to serve HP customers with empathy and efficiency,” said Marcus Graham,
founder and CEO of GM Voices.

Apple bans Google Voice from iPhone
    In an action that has been controversial in the broader media, Apple was reported to have banned the
Google Voice application, as well as third-party Google Voice applications, from its iPhone App Store.
Google Voice is a form of free VoIP telephone service in some of its features, although this newsletter has
highlighted its voicemail-to-text feature (SSN, April 2009, p. 1). Some media have speculated that AT&T,
the exclusive carrier for the iPhone, doesn’t want the competition for phone service, but AT&T said officially
on August 21 that it played no role in a decision by Apple to reject Google’s voice application.
    But, in a letter to the Federal Communication Commission (FCC) in late August, Apple said it has not
approved Google Voice because it appears to replace the iPhone's core mobile telephone functionality and
user interface with its own system for telephone calls, text messaging, and voicemail, changing the user
experience. “Contrary to published reports, Apple has not rejected the Google Voice application, and
continues to study it,” Catherine Novelli, Apple vice president for worldwide government affairs, said in the
FCC letter.
    Apple has approved some VoIP applications like eBay’s Skype, for use over Wi-Fi but not on AT&T’s
3G network. Apple said there is a provision in its agreement with AT&T that obligates Apple not to include
functionality that allows a customer to use AT&T’s cellular network to originate or terminate a VoIP session
without first getting AT&T’s permission. An AT&T exec has been quoted as saying the company plans to
take a new look at authorizing VoIP capabilities on the iPhone for use on AT&T’s 3G network.

Navigon navigation software on Apple iPhone will add announcement of street names using
SVOX text-to-speech
    The Apple iPhone has been equipped with Navigon MobileNavigator software, allowing iPhone owners
to be guided to any destination in up to 40 countries in Europe. iPhone 3G and 3G S owners will soon be able
to upgrade their MobileNavigator software with a free update that brings additional features such as the
announcement of street names using text-to-speech from SVOX.

Microsoft-Nokia alliance may compete with RIM—by putting Office on your mobile phone!
    Microsoft and Nokia announced an alliance in August to bring business software to smartphones to
compete with Research in Motion’s BlackBerry. The latest versions of Microsoft’s Office applications,
including Word, Excel, PowerPoint, and messaging, according to the announcement, will be available on a
range of Nokia cell phones, which make up 45% of the global smartphone market.
    A mini-editorial: This newsletter has often commented that assuming that a mobile phone can use a
desktop-style PC interface ignores reality, so one hopes that eventually speech input and navigation will be a
standard feature in increasingly complex—but small—mobile devices. To turn the issue around, would you
put a camera and GPS system in your desktop computer? What works on a mobile phone won’t necessarily
work on a PC. What works on a PC won’t necessarily work on a mobile phone.

Nuance’s predictive text and embedded mobile speech software are named to
VisionMobile’s 100 Million Club
    VisionMobile is a market analysis and strategy firm for the mobile industry. The T9 predictive text and
embedded mobile speech software from Nuance Communications were both named to VisionMobile’s 100
Million Club for the second half of 2008. The VisionMobile 100 Million Club recognizes software businesses
that have succeeded in establishing a significant presence in the mobile handset market, specifically those
whose products have been embedded on more than 100 million mobile phones. Nuance’s T9 predictive text
Speech Strategy News                                     September 2009                                   45
portfolio tops VisionMobile’s list, having shipped on more than 4.1 billion phones to date, making it the most
pervasive third-party software application in the mobile market today.

Esnatech & eOn Communications launch Mobile United Communications client software on
BlackBerry App World
    eOn Communications Corporation and Esna Technologies Inc. previously announced a relationship to
deliver eOn eNterprise IP Messenger, a Unified Communications (UC) platform for the enterprise and contact
center markets (SSN, November 2008, p. 16). The companies have now launched a version of their mobile
unified communication software for the eOn eNterprise IP Messenger for RIM BlackBerry devices in the
BlackBerry App World. The Mobile UC Client provides a complete communications & collaboration solution
for mobile users, integrating presence, messaging, mobility, and computer telephony. Once installed in
conjunction with eNterprise IP Messenger server, users can link their devices to their office phone system and
use the BlackBerry as their office communication device. The eNterprise IP Messenger uses text-to-speech
and speech recognition technology.

Ultratec’s CapTel service allows phone conversations for those with hearing loss
    The CapTel Relay Service developed by Ultratec, Inc. can convert the speech from an incoming or
outgoing call to text for display on a specialized phone, providing an aid for those with hearing loss. The
system works by an operator monitoring the conversation and repeating what was said into a speaker-
dependent speech recognition system to convert it to text quickly. The audio is also delivered. An Ultratec
spokesperson said that CapTel technology uses speech recognition engines that are “customized for our
application,” but “we don’t really share publically more specific information about our technology platform.”

Verizon Wireless initiatives encourage developers, contest and app store planned
     More than 500 mobile applications developers and others involved in the mobile apps marketplace
gathered in Silicon Valley for the first Verizon Developer Community (VDC) Conference. The company
announced the launch of its developer portal ( and the upcoming launch of the
V CAST Apps storefront. John Stratton, Verizon’s chief marketing officer, said the company values and
relies on input from the developer community, and plans to make the process for registering and distributing
applications with Verizon “simple, fast and straightforward.” Roger Gurnani, Verizon Wireless senior vice
president for product development, said the company will encourage new apps development with a contest
starting next month.

PropertyMinder adds text-to-speech to real estate websites
    PropertyMinder’s AccelerAgent Websites (designed for real estate agents to reach prospects) have
implemented Text-to-Speech technology, which can play unlimited text and dynamically generated content.
AccelerAgent Websites allow agents to select the voice that their visitors hear when they land on the site.
Aric Kazarnovsky, Executive Vice President of PropertyMinder, said, “With Web 2.0, the audio helps
communicate the agent’s value to his or her target audience. While competing websites only play pre-
recorded audio messages, PropertyMinder’s super reliable AccelerAgent Websites can now play unlimited
content, read listing descriptions, talk about you and your value proposition, make special offers, talk about
neighborhood attractions and more.”

Australian welfare agency adds text-to-speech to web site to aid visually impaired,
developed by VoiceCorp International
    Australian national welfare agency Centrelink has launched a new speaking feature on its Web site for
the visually impaired. The ReadSpeaker feature, developed by VoiceCorp International, allows users to
stream or download a spoken version of the site’s text, regardless of bandwidth and without additional
software installation. Chris Bowen, the Federal minister for Human Services who answerable for Centrelink,
said that the new text-to-speech technology will allow customers with low vision, lower levels of literacy, or
those for whom English is a second language, to access Centrelink information. Centrelink delivers welfare
services to 6.5 million Australians and receives 140 million page views on its website each year.
Speech Strategy News                                      September 2009                                    46
DynaVox Mayer-Johnson announces hand-held speech solutions for augmentative and
alternative communication (AAC)
    DynaVox Mayer-Johnson, a provider of communication and education solutions for individuals with
speech, language, and learning disabilities, announced the DynaVox Xpress. The Xpress is a speech-enabled
hand-held device that brings together augmentative and alternative communication (AAC) tools with a
variety of mainstream communication features. Communications can be initiated, for example, from a touch
screen. Twin front-firing speakers allow Xpress users to be heard in virtually any environment. The new
voices included with the Xpress are natural-sounding and add emotion—laughter, crying, shouting, and

English-Italian dictionary CD-ROM to use Loquendo text-to-speech
    Zanichelli Editore, a leading Italian publishing house, and Loquendo announced the selection of
Loquendo text-to-speech for the CD-ROM of the Zanichelli English-Italian Dictionary 2010 edition to
provide vocal pronunciations of words. Zanichelli's English-Italian Dictionary (known as ‘Il Ragazzini’)
attempts to reflect the rapid developments taking place in the English language each year, including
neologisms and the evolving use of words and expressions arising from cultural changes in the diverse
English-speaking world. The CD-ROM which accompanies the dictionary contains all the entries present in
the printed version, but with an additional feature allowing users to listen to the correct pronunciation of all
English words in the dictionary—using Loquendo TTS for new additions.

Students Improve writing, reading skills with Pearson's WriteToLearn supported by
SpeechStream from Texthelp
    Pearson provides education technology. Texthelp Systems is an educational software company
specializing in the design of literacy support and assistive technology to help individuals improve their
reading and writing abilities (SSN, June 2009, p. 34). Pearson will embed Texthelp’s reading support
technology, SpeechStream, in its Web-based learning tool, WriteToLearn. With WriteToLearn, students
practice essay writing and summarization skills, and their efforts are measured by Pearson’s Knowledge
Analysis Technologies (KAT) engine. The KAT engine evaluates the meaning of text by examining whole
passages, not just grammatical correctness or spelling.
    Using SpeechStream’s text-to-speech capabilities, WriteToLearn meets a wide variety of learner needs
for a diverse group of students, including support for programs for Title I, Response to Intervention, learning
disabled, English language learners, and at-risk learners. With SpeechStream, students can have online
content read aloud with highlighting. The dictionary and spot word translation ability in WriteToLearn 5.0
enables students to instantly retrieve the dictionary definition or Spanish translation of words in a reading

LXE introduces MX9 ultra-rugged data collection computers for harsh environments
    LXE Inc., the rugged mobile computer business of EMS Technologies, Inc., announced the immediate
availability of the MX9, MX9CS, and MX9HL ultra-rugged handheld computers, designed for use in a wide
array of heavy industrial and outdoor data collection environments. All versions also include LXE’s
Toughtalk technology, supporting speech recognition applications.

SanDisk Cruzer Enterprise secure USB flash drives support text-to-speech screen readers for
    SanDisk announced that its SanDisk Cruzer Enterprise secure USB flash drives (which include
cryptographic modules and encryption algorithms) are now tested and certified under Military Standard 810-F
environmental standards in addition to being suitable for use by the visually-impaired under Section 508
requirements. Section 508 of the Rehabilitation Act of 1973 requires federal agencies to make their electronic
and information technology accessible to people with disabilities. SanDisk reconfigured the Cruzer
Enterprise’s graphical user interface (GUI) to increase accessibility for visually-impaired users. The
encrypted USB drives are now compatible with certain assistive technologies such as screen reader software
Speech Strategy News                                      September 2009                                    47
that recreates the Cruzer Enterprise’s GUI through text-to-speech representation or via a Braille output

US grants $1.2 billion for electronic health records
    The U.S. government announced grants of almost $1.2 billion to help hospitals and health care providers
establish and use electronic health records. The grants include $598 million to set up some 70 health
information technology centers to help health care institutions acquire electronic health record systems and
$564 million to develop a nationwide system of health information networks. The grants will be funded by
the American Recovery and Reinvestment Act of 2009 and be made available in 2010.

Wellmont Health System awards contract to MedQuist for enterprise-wide transcription and
radiology speech recognition
    MedQuist Inc., a provider of technology-enabled clinical documentation services, announced that it has
been awarded the preferred vendor contract by Wellmont Health System of Kingsport, Tennessee for
enterprise medical transcription services, as well as for speech recognition in radiology. MedQuist historically
used speech recognition from Philips Speech Systems, which has been acquired by Nuance.

Updated candidate recommendation of Speech Synthesis Markup Language (SSML)
    Kazuyuki Ashimura, W3C Multimodal & Voice Activity Lead, announced that the updated Candidate
Recommendation of Speech Synthesis Markup Language (SSML) Version 1.1 has been published at

International Telecommunications Union promotes implementation of United Nations
Convention on the Rights of Persons with Disabilities in information and communications
    The International Telecommunications Union (ITU) Asia-Pacific Regional Forum on Mainstreaming ICT
Accessibility for Persons with Disabilities, held in Bangkok, Thailand from 25 to 27 August 2009, shared a
range of critical policy and regulatory measures to promote accessible information and communication
technologies (ICT) for persons with disabilities. It is expected that the UN Convention will make assistive
ICT technologies as common as wheelchair ramps and audible signals for traffic lights, which have already
become standard in many parts of the world. Assistive technologies include screen readers, captioning or sign
language on television for the deaf, cell phones that include features such as special volume control, large
character touch pads and predictive text features, as well as the adoption of accessible website design by both
the public and private sectors. The number of persons with disabilities is increasing worldwide, due to aging
populations in some countries, as well as war and civil conflict, natural disasters, malnutrition and other

Fonix iSpeak voice dial by name or number app available for iPhone 3.0, and lip-sych
software licensed for Microsoft’s South Park Xbox video game
    Fonix Speech, a wholly owned subsidiary of Fonix Corporation specializing in embedded speech
interfaces for mobile devices, handheld electronic products, video game systems and processors, introduced
Fonix iSpeak 1.4 for the Apple iPhone 3.0 operating system. Fonix iSpeak provides voice dial by name or
number and voice confirmation using recorded speech with text-to-speech. The iPhone 3GS was recently
introduced its own voice control (SSN, July 2009, p. 1). Fonix Speech also announced a license agreement for
Fonix VoiceSync 1.0 for Microsoft’s South Park Let’s Go Tower Defense Play! video game to be published
in 2009 for Xbox LIVE Arcade. Fonix VoiceSync 1.0 detects phonemes and maps the phonemes to audio
WAVE files to create natural lip-movements.
    Fonix Speech also announced a license of Fonix VoiceIn 4.2 to Autodesk. Autodesk offers 2D and 3D
design software for the manufacturing, building and construction, and media and entertainment markets.
Speech Strategy News                                       September 2009                                    48
New draft of Media Resource Control Protocol Version 2 standard available
    The Media Resource Control Protocol Version 2 (MRCPv2) protocol allows client hosts to control media
service resources such as speech synthesizers, recognizers, verifiers, and identifiers residing in servers on the
network. A New Internet-Draft of the standard is now available at
speechsc-mrcpv2-20.txt. This draft is a work item of the Speech Services Control Working Group of the
IETF. MRCPv2 is not a “stand-alone” protocol; it relies on other protocols, such as Session Initiation
Protocol (SIP) and the Session Description Protocol (SDP).

A sobering thought
    “For a list of all the ways technology has failed to improve the quality of life, please press three.”
    Alice Kahn

                                      Financial Notes
Nuance announces third quarter 2009 results—11.2% increase in GAAP revenue to $241
million, non-GAAP net income of $73.3 million, and cash balance of $418.6 million
     On August 10, Nuance Communications, Inc. (NASDAQ: NUAN) announced financial results for the
third fiscal quarter ended June 30, 2009. Nuance reported GAAP revenue of $241.0 million in the quarter, an
11.2% increase over GAAP revenue of $216.7 million in the quarter ended June 30, 2008. The Company
reported non-GAAP revenue of approximately $251.3 million, which includes $10.3 million in revenue lost
to accounting treatment in conjunction with the company’s business and technology acquisitions. Non-GAAP
revenue grew approximately 9.6% over non-GAAP revenue of $229.2 million in the same quarter last year.
     Nuance recognized a GAAP net loss of $1.0 million, or $(0.00) per diluted share, in the quarter ended
June 30, 2009, compared with a GAAP net loss of $9.9 million, or $(0.05) per diluted share, in the quarter
ended June 30, 2008. For the period ended June 30, 2009, Nuance reported non-GAAP net income of $73.3
million, or $0.26 per diluted share, compared to non-GAAP net income of $51.3 million, or $0.22 per diluted
share, in the quarter ended June 30, 2008. Nuance reported cash flow from operations of $53.7 million in the
quarter ended June 30, 2009, compared to $48.1 million in the same quarter last year.
     In prepared remarks, Nuance indicated Nuance’s revenue benefited from strength in the company’s
recurring revenue streams, especially hosted revenue, and growth in Nuance Mobile Care revenue as
deployments in key carriers progressed. In Q3 2009, non-GAAP professional services, subscription and
hosting revenue was $112.5 million, up 33.8% from $84.1 million a year ago, and year-to-date was $308.1
million, up 37.2% from $224.5 million a year ago. As more customers transition to subscription or
transactional pricing models, a growing proportion of new sales contribute revenue over time rather than
immediately in the quarter in which the sale occurred. As subscription and transaction-based agreements
signed in past quarters have been deployed, and as volume of usage in those solutions has increased, revenue
from those solutions has increased. The company is also making headway in increasing future recurring
revenues, as evidenced in progress in completion of large, multi-year on-demand engagements and growing
bookings in solutions businesses.
     Highlights reported from the quarter included:
 Healthcare-Dictation- Non-GAAP revenue for Nuance's healthcare and dictation solutions was $108.1
     million, up 27% from the same quarter last year. Revenue in Nuance’s healthcare unit grew year-over-
     year, fueled by hosted, on-demand solutions, as a record number of new customers went live in Nuance's
     hosted transcription services.
 Mobile-Enterprise - Non-GAAP revenue for Nuance's enterprise and mobile solutions was $125.5
     million, up slightly from the same quarter last year. Nuance experienced continued strength in enterprise
     on-demand, professional services and maintenance contracts, especially in North America, with wins at
     customers such as Bank of America, Cigna, TD Ameritrade, and United Airlines. Nuance Mobile Care
     revenue grew as deployment progressed within carrier customers. National Australia Bank Personal
     Banking deployed a voice biometric identification and verification function incorporating Nuance
Speech Strategy News                                       September 2009                                     49
    technology to improve customer experience and security. Nuance’s mobile revenue streams again
    reflected the challenges of reduced purchases of mobile devices worldwide. During Q3 2009, millions of
    new smart phones shipped with Nuance products that enable voice control of various functions. In
    addition, Nuance won significant new contracts at HTC, LG, MiTAC/Magellan, Samsung and Vodafone.
   Imaging - Non-GAAP revenue for Nuance's PDF and document imaging solutions was $17.7 million,
    down 8%, from the same quarter last year.
   Operational Achievements - Nuance benefited from its focus on expense controls and accelerating
    synergies from recent acquisitions to significantly improve non-GAAP margins. Non-GAAP operating
    margins rose to 32.6%, compared to 27.5% in the third quarter 2008. Cash flows from operations were
    $53.7 million in the third quarter 2009, compared to $48.1 million a year ago. On a year-to-date basis,
    cash flows from operations were $184.3 million, compared to $130.1 for the same period in 2008. The
    Company's cash balance as of June 30, 2009, was $418.6 million.

                                           Nuance revenues over time

Ditech Networks reports 34% revenue growth in fiscal quarter
     On August 20, Ditech Networks, Inc. (NASDAQ: DITC), which provides an innovative speech
recognition driven service (p. 11), reported financial results for its fiscal 2010 first quarter. Revenues for the
first quarter were $6.1 million. The GAAP net loss for the first quarter was $3.8 million or $0.14 per share.
The non-GAAP loss for the first quarter was $3.4 million or $0.13 per share. Todd Simpson, Ditech's
president and CEO, said, “I am pleased to announce that we exceeded our revenue guidance for the last
quarter…we made significant progress on our voice applications platform which will soon be available for
beta trials.”

Convergys reports second quarter results, customer management operating income up 90%
    On July 30, Convergys Corporation (NYSE: CVG) announced its financial results for the second
quarter of 2009, including year-over-year improvement in Customer Management operations. Second quarter
2009 revenues were $683 million compared with $690 million in the same period last year. Revenue growth
from Customer Management and HR Management largely offset the expected decline in Information
Management. Non-GAAP operating income was $50 million, a 5% increase compared with the prior year
period; GAAP operating loss was $71 million. The company’s cash balance increased to $336 million.

NICE Systems reports second quarter 2009 non-GAAP decline in revenue compared to Q2
    NICE Systems (NASDAQ: NICE), which provides speech analytics solutions (p. 16), reported financial
results for the Second Quarter of 2009. Non-GAAP revenues were $140.5 million, up from $139.2 million in
the first quarter, but 9.6% down from $155.3 million in the second quarter of 2008. Second quarter 2009 non-
GAAP net income was $22.1 million, compared to $24.0 million, in the second quarter of 2008. Net cash
generated from operations was $29.2 million.
    Haim Shani, Chief Executive Officer, NICE Systems Ltd., said, “In the second quarter we started to see
improvement, with bookings, revenues, operating margins, and profitability coming in higher than the first
quarter. Business improved in both the Americas and APAC, in the different product lines. Looking ahead,
Speech Strategy News                                     September 2009                                   50
we believe that these trends and the strong pipeline of large security projects will translate into top and
bottom line growth in the second half of 2009, compared to the first half of the year.”

Interactive Intelligence reports second quarter revenues increased to $32.9 million
    On July 30, Interactive Intelligence (Nasdaq: ININ), a global provider of unified IP business
communications solutions, announced operating results for the three months ended June 30, 2009.
company reported revenues of $32.9 million for the second quarter of 2009, an increase of 7.5% over
revenues of $30.6 million for the second quarter of 2008. Second quarter 2009 results included:
income on a GAAP basis of $3.0 million, up from $1.3 million in the second quarter of 2008;

operating income of $3.7 million, compared to $2.2 million in the same quarter last year. Cash and
investment balances at quarter-end were $54.1 million with no debt.

In-Q-Tel investment in Carnegie Speech to enhance spoken-language training software for
the US intelligence community
    Carnegie Speech provides software for assessing and teaching spoken language to non-native speakers
(SSN, June 2007, p. 11, and p. 31). The company announced a strategic partnership and technology
development agreement with In-Q-Tel, an independent strategic investment firm that identifies innovative
technology solutions to support the mission of the CIA and the broader U.S. Intelligence Community.
Carnegie Speech’s exclusive global license to speech recognition and artificial intelligence technologies from
Carnegie Mellon University enables personalized spoken-language assessment and training.
    Angela Kennedy, Carnegie Speech president & CEO, said, "Through our relationship with In-Q-Tel,
government agencies will have access to software that enables rapid learning and understanding of words and
phrases. Our language tutorials have improved spoken-language proficiency in a variety of industry sectors in
countries around the world. We are confident that our software, with its pinpoint speech evaluation, targeted
remediation, and personalized curriculum, will deliver similar benefits to U.S. government agencies.”
    Carnegie Speech software uses a combination of speech recognition and proprietary pinpointing
technology that models a user's speaking characteristics to analyze speech proficiency and develop a
personalized spoken-language training curriculum for each student. Carnegie Speech's technology compares
each student's spoken language against a composite statistical model of native speakers’ speech to pinpoint
errors and give detailed and effective remediation instruction.

Call Genie reports second quarter 2009 financial results
     On August 7, Call Genie (TSX:GNE), a provider of mobile local search and advertising solutions,
announced financial results for the second quarter ended June 30, 2009. Revenues fell to $0.9 million
compared to $1.4 million for Q2 2008. The net loss excluding stock based compensation expense was $3.1
million compared to $4.9 million in 2008. Cash used in operations was $2.0 million compared to $4.5 million
for the second quarter of 2008.
     The Company signed a contract amendment with RH Donnelley that extends the term of the contract and
expands the scope of the solutions deployed. Under the terms of the agreement, $2,750,000 is expected to be
paid to Call Genie in 2009 on account of upfront fees and payments for product deployment and custom
development work.
     Effective June 26, 2009, the Company completed a $2.5 million debt financing involving the distribution
of secured convertible debentures and common share purchase warrants. The debentures will bear interest at a
rate of 10% per annum, payable semi-annually, and will mature on May 30, 2012.

Wizzard Software announces 2009 Q2 21% decline in revenues and a loss
    On August 14, Wizzard Software (NYSE Amex: WZE) announced financial results for the second
quarter ended June 30, 2009. Wizzard Software has long offered consulting, speech development tools, and
speech-based applications for the desktop and Internet, and has diversified recently into other markets, such
as digital media (podcasts).
    The company reported revenues of $1,160,919, a 21% decrease from revenues of $1,465,874 in the
second quarter of 2008. Wizzard's net loss was $2,193,552, or $0.05 per share, in the second quarter of 2009,
versus a net loss of $1,998,225, or $0.04 per share, in the second quarter of 2008.
Speech Strategy News                                     September 2009                                   51
    The company lately has emphasized its podcast business, rather than its speech business. Wizzard Media
(NYSE Amex: WZE), a podcasting network, announced the launch of the second phase of its new podcast
network monetization strategy with the approval of the Hawaii Surf Session Report App, the first Wizzard
Network customized Podcast App currently for sale through Apple’s App Store. The Podcast App, developed
by Wizzard Media, can be customized for a podcast producer in less than 30 minutes to include bonus
material, show extras, interactive forms of communication between audience and host, as well as other
information and entertainment.
    Wizzard announced that it served over 13,700,000 podcast ad impressions through its Alchemy
advertising technology in the quarter ending June 30, 2009. The 13.7 million ads served in the second quarter
is an increase of 251% from the 3.9 million impressions served in the second quarter of 2008, and a 46%
increase from the 9.4 million ad impressions served in the first quarter of 2009.

Fonix outlines acquisition strategy
    On July 29, in a published letter to shareholders, Fonix indicated that management has adopted a growth
plan consisting of organic revenue through acquisition. The recently acquired GaozhiSoft is an example of the
type of target that meets the company’s expectation.

Genesys Telecommunications Laboratories names Jason Stirling as the new senior vice
president of Genesys Asia Pacific
    Genesys Telecommunications Laboratories, an Alcatel-Lucent company, announced the appointment
of Jason Stirling as the new Senior Vice President of Genesys Asia Pacific, part of the Application Software
Group. Having managed the overall operations of Genesys in Australia, India and New Zealand for the past
two years, Mr. Stirling now has overall responsibility for all Genesys-related customer activities in the Asia
Pacific region, including sales, channel management, delivery and customer satisfaction. Before joining
Genesys in 1997, Jason worked for Hewlett-Packard Australia in a business development role.

Richard R. Devenuti elected to Convergys Board of Directors
    Convergys Corporation announced the election of Richard (Rick) R. Devenuti to its Board of Directors
effective August 24, 2009. Devenuti, is currently Senior Vice President of EMC Corporation. EMC
Corporation, with revenues of nearly $15 billion in 2008, provides enterprise storage systems, software, and
services and develops, manufactures, and markets intelligent enterprise storage and retrieval equipment and
related software. From 1987 to 2006, Devenuti served in increasingly more responsible executive positions
with Microsoft. He served as Senior Vice President, Services and Information Technology, and as Vice
President and Chief Information Officer.

Thomas Connelly appointed President/COO of OrderCatcher, Inc.
    Craig Downs, CEO of OrderCatcher Inc. (SSN, January 2009, p. 17), has appointed Thomas Connelly
President and COO of the company. Connelly’s career has included 25 years of management consulting and
executive search experience with Korn/Ferry International and 9 years running his own high-end Internet-
based tourism company Old Burma Tour & Trading Company, Ltd. OrderCatcher develops and markets
speech recognition systems specifically designed for the food-to-go industry and is presently developing
applications for other sectors including lodging.

Avaya’s Carol Giles Neslund recognized by Everything Channel’s CRN Magazine as One of
the Top 100 Women in the Channel
    Avaya announced that Carol Giles Neslund, vice president, North American Channel, has been
recognized by Everything Channel’s CRN Magazine as one of the Top 100 Women in the Channel. Neslund
Speech Strategy News                                                 September 2009                                           52
has over eight years of experience working with channel organizations. Since coming to Avaya nearly a year
ago, she has led the implementation of Avaya’s High Touch, Channel Centric program in North America with
the objective to substantially increase the percentage of sales made through channel partners.

  For Further Information on Products Mentioned in this Issue
        Company             Location            Product Mentioned                                  Contact info
                                            Transcription services, including
3Play Media                Boston, MA       video captioning                       (617)764-5189;
                                            Mobile phone access for finding
Aisle 411 Inc.             St. Louis, MO    products in specific stores            (314)633-5000;
Android (from Google)      --               Mobile phone OS               (subs. of                         Speech-enabled telephone service
Microstrategy)             McLean, VA       for smaller businesses                 (703)269-1070;
                                            Personal computers, music players,
Apple                      Cupertino, CA    wireless phones              
Applications Technology
(AppTek)                   McLean, VA       Text and speech translation software   (703)821-5000;
                                            Non-profit organization supporting
Applied Voice Input                         quality speech application
Output Society (AVIOS)     San Jose, CA     development                            (408)323-1783;
                           Chelmsford,      Telephone self-service system with
Aspect Software            MA               speech recognition                     (978)250-7900;
                           San Antonio,
AT&T                       TX               Telecommunications services  
AT&T Wireless              —                Wireless telephone services  
ATX Group (div. of Cross
Country)                   Irving, TX       Automotive telematics                  (972)753-6200;
Auraya Systems             Australia        Speaker authentication                 +61 2 6201 5253;;
Aurix (was 20/20                                                                   +44 1684 585101; US: (703)414-8160;
Speech)                    Malvern, UK      Speech analytics technology  
                           San Rafael,
Autodesk                   CA               3D design software           
                           San Francisco,   Contact center analytics and audio
Autonomy, Inc.             CA               search                                 (415) 243 9955;
                           Basking Ridge,
Avaya Inc.                 NJ               Enterprise telephony solutions         (908)953-6000;              Cary, NC         VoIP service                           (800)808-5150;
                           Cambridge,       Speech recognition, audio search,
BBN Technologies           MA               and natural language technologies      617-873-1600;
Biomni Voice Ltd                             Interactive voice messaging
(voicexcel)                London, UK       services                     ;
BMW                                         Automobiles                  
Business Systems (U.K.)
Ltd                        Isleworth, UK    Contact center specialist              +44 20 8326 8200;
                           Calgary, AL,     Speech recognition solutions for the
Call Genie                 Canada           directory services industry            (403) 268-0411;
CallCopy, Inc.             Columbus, OH     Contact recording solutions            (614)340-3346;
CallMiner                  Fort Myers, FL   Speech analytics                       (239)689-6463;
                           Santa Barbara,
CallWave                   CA               Voicemail access and management
Cambridge University
Engineering Department     Cambridge,                                              (44)223-332-654;,
(CUED)                     England          Basic research               
Carnegie Mellon                             Speech recognition, robotics, and
University                 Pittsburgh, PA   translation research                   (412)268-2900;
                                            Reading training using speech
Carnegie Speech            Pittsburgh, PA   recognition                            (412)622-2181;
                                            Internet infrastructure and IP
Cisco                      San Jose, CA     telephony                              (800) 553-6387;
                                            IVR and business intelligence
ClickFox                   Atlanta, GA      solutions                              (404)351.8020;
Speech Strategy News                                                   September 2009                                               53
     Companies mentioned in this issue
      Company                Location             Product Mentioned                                   Contact info
                                             Voice/data recording and quality
ComputerTel                 Gravesend, UK    evaluation solutions                     +44 1474 561111;
                                             Customer care and employee-
Convergys Corporation       Cincinnati, OH   benefit solutions                        (513)723-7153;
                                             VoiceVerified hosted speaker
CSIdentity                  Austin, TX       authentication solution                  (800)805-7004;
DARPA (Defense
Advanced Research
Projects Agency)            Arlington, VA    Research support               
Dialogic                    Canada           Communication products                   (514)745-5500;
Dimension Data              London, UK       IT solutions and services                +44 (207) 651 7000;
                            Mountain View,   Voice quality and voice access
Ditech Networks             CA               solutions                                (650)623-1300;
                            West Orange,
DMG Consulting              NJ               Contact center consulting                (973)325-2954; '
                                             Communications recording and             (866}DSS-CORP;;
DSS Corporation             Southfield, MI   document management            
DynaVox Mayer-Johnson       Pittsburgh, PA   Assistive communication devices          (412)381-4883;
Eckoh Technologies          London, UK       Hosted voice-enabled services            +44 20 7505 7800;
                                             Hammer telephone application
Empirix                     Bedford, MA      testing                                  (781)266-3200;
eOn Communications
Corporation                 San Jose, CA     Communications solutions                 (800)955-5321;
Epic Systems                Verona, WI       Electronic Medical Records               (608)271-9000;
                            Richmond Hill,
Esna Technologies           Ontario,
(Esnatech)                  Canada           Business communications solutions        (905)707-9700;
                                             Speech recognition and TTS
Fonix Corporation           Lindon, UT       products                                 (801)553-6600;
Laboratories (Alcatel-                       Call routing and contact center
Lucent subs.)               Daly City, CA    solutions                                (888)GENESYS;
Global Crossing             CO               VoIP telephone services                  (406)651-4028;
GM Voices                   Alpharetta, GA   Recorded prompts and audio               (770)752-4500;
                                             Telephone speech recognition
Gold Systems                Boulder, CO      solutions                                (303)447-2774;
                            Mountain View,                                            (650)253-0000;;
Google                      CA               Voice and directory search     ;
Grammar Studio              --               Grammar development tool       
Greenfield Online           Wilton, CT       Surveys                                  (203)834-8585;
Health Dialog (subs. of                      Healthcare analytics and decision
Bupa)                       Boston, MA       support                                  (617)406-5200;
Humanity Interactive        Kirkwood, WA     Online virtual personalities             (866)260-8967;
                                             WebSphere infrastructure, speech
                                             recognition, speaker verification, and
                                             multi-modal technology and
IBM                         Somers, NY       platforms                                (877)426-3774;
IETF Speech Services
Control Working Group
(SpeechSC)                  —                IETF standards committee       
                                             Hosted IVR & Telephone Automation
Ifbyphone, Inc.             Skokie, IL       Applications                             (877)295-5100;
                                              Not-for-profit, strategic investment
                                             firm supporting U.S. Intelligence
In-Q-Tel                    Arlington, VA    Community                                (703)248-3000;
IntelePeer                  San Mateo, CA    Hosted rich media communications         (650)525-9200;
Interactive Intelligence,   Indianapolis,
Inc.                        IN               Unified Communications and IVR           (317)872-3000;
Telecommunication           Geneva,
Union (ITU)                 Switzerland      Standards setting body         
Internet Engineering
Task Force (IETF)           —                Internet standards body        
Speech Strategy News                                                   September 2009                                            54
     Companies mentioned in this issue
        Company              Location             Product Mentioned                                   Contact info
                                             Free Ad-supported Directory             (877)754-6453;;
Jingle Networks, Inc.      Bedford, MA       Assistance                    
                           Marina del        Statistical machine translation
Language Weaver            Rey, CA           software                                (310) 437-7300;
                                             Custom enterprise (HCM/ERP/CRM)         (516)921-1500;
LBi Software Engineering   Woodbury, NY      applications                  
                                                                                     +39 011 291 3111;;
Loquendo                   Turin, Italy      Speech technology licensing             Developers:
                                             Speech recognition engine and
LumenVox LLC               San Diego, CA     development tools                       (858)707-0707;
LXE, subsidiary of EMS
Technologies               Norcross, GA      Speech-enabled wireless terminals       (404)447-4224;
                           Mount Laurel,
MedQuist Inc.              NJ                Medical document creation               (800)233-3030;
Message Technologies,
Inc. (MTI)                 Atlanta, GA       IVR and speech technologies             (800)868-3684;
                                             Various applications and products
Microsoft Corporation      Redmond, WA       using speech technology                 (206)454-2030;
MicroStrategy                                E-commerce customer relationship
Incorporated               Vienna, VA        management                              (703)848-8600; Inc.         Sunnyvale, CA     Mobile productivity solutions           (408)512-3016;
Navigon                    Germany           Navigation systems and software         +49 40 / 380 383-0;
Networks in Motion         Aliso Viejo, CA   Wireless navigation and local search    (949)453-1646);
Nexidia                    Atlanta, GA       Audio content search                    (404)495-7220;
Next IT                    Spokane, WA       AI software                             (509)242-0767;
Nice Systems               Israel            Multimedia analytics                    +972 9 775-3777;
                           Helsinki,         Mobile phones and personal
Nokia                      Finland           navigation devices                      +358-9 1807 459;
                                             Telephony and networking systems        1-800-4NORTEL;
Nortel                     Bohemia, NY       for service providers and enterprises
Novauris Technologies      Cheltenham,                                               +44 1242 678581 (UK); (530)753-1160 (US);
Ltd                        England           Speech recognition technology 
Nu Echo Inc.               Canada            Speech application consulting           (514)861-3246;
                                             Speech technology, applications,
Nuance Communications      Burlington, MA    and services                            (617)428-4444;
Omnesys Technologies       Bangladore,       Brokerage and Trading Platform
(India)                    India             provider                                +91 80 6665 7800;
                                             Workforce Optimization and speech
OnviSource                 Plano, TX         analytics                               (469)241-9200;
                                             Mobile Internet infrastructure
Openstream, Inc.           Edison, NJ        platform and applications               (732)417-1200;
OrderCatcher Inc.          Miami, FL         Speech application developer            (800)838-9518;
Pearson plc                London, UK        Education publishing                    +44 20 7010 2314;
                           Woodbury, NY,
                           and Tel Aviv,                                             1(516)677-7291; +972-3-7678666;
Persay                     Israel            Speaker authentication technology
Plug'n Pay Technologies    NY                eCommerce solutions                     (631)761-0159;
                           Ontario,          Speech recognition computer
Pronexus                   Canada            telephony                               (613)271-8989;
PropertyMinder                               Real-estate agent productivity tools    (800)807-3890;
                           Fort Collins,
Red Shift Company, LLC     CO                Speech recognition technology           (866)-818-2084;
Research in Motion         Waterloo, ON,
(RIM)                      Canada            Blackberry mobile devices               (519)888-7465;
                           Toronto, ON,
Rogers Wireless            Canada            Mobile telephone service provider
                                             Call center integration and
Sabio                      London, UK        applications                            +44 20 7633 3900;
Sanderson Studios          Woodside, CA      Ad agency                               (650)851-6832;
SanDisk                    Dublin, Ireland   Flash memory data storage products      US: (408)801-1000;
SAP                        Worldwide         Enterprise software                     +49 180 534-34-24;
Speech Strategy News                                                     September 2009                                           55
     Companies mentioned in this issue
      Company                Location              Product Mentioned                                  Contact info
Servion Global Solutions    Princeton, NJ     Contact center solutions                (609)987-0044;
Skype (an eBay
company)                    Luxembourg        VoIP telephone service        
Spanlink                    MN                Telephone applications developer        (763)971-2000;
Speech Technology           St. Petersburg,
Center (STC)                Russia            Speech technology                       +7 812 325-8848;
SpeechGear                  Northfield, MN    Translation software                    (507)664-9123;
SpinVox                     Marlow, UK        Voicemail-to-text service               +44 020 7965 2000;
                            Zurich,           Speech recognition and text-to-
SVOX AG                     Switzerland       speech technology                       +41 43 544 06 23;
                                              Avaya Voice Portal and Call Center-
SwampFox                    Columbia, SC      based products and services              (803)451-4540;
Syntellect Inc. (subs. of                                                             (602)789-2800;;
Enghouse)                   Phoenix, AZ       Voice processing platforms    
T-Mobile                    Germany           Wireless service                        +49 228/936-1717;
T-Mobile USA                Bellevue, WA      Wireless service                        1-800-T-MOBILE;
                            Santa Clara,
TeleNav                     CA                Navigation services                     (408)245-3800;
                            Mountain View,
Tellme (Microsoft subs.)    CA                Voice infrastructure hosting            (800)555-TELL; (650)930-9000;
Texthelp Systems            Woburn, MA        Literacy support                        (888)248-0652;
The FeedRoom                New York, NY      Online video communications             (212)219-0343;
The PELORUS Group           Raritan, NJ       Market research                         (908)707-1121;
Communications              Sunnyvale, CA     On-demand contact center                (408)338-0900;
Ultratec, Inc.              Madison, WI       Captioned Telephone Relay Service
                            San Francisco,
Utopy                       CA                Call center speech mining               (415)621-5700;
Verint Systems              Melville, NY      Call center and security solutions      (631)962-9600;
                                              Managed and/or hosted enterprise
Verizon Business            USA               solutions                               1-877-297-7816;
Verizon Communications      New York, NY      Telephone service provider    
Verizon Wireless            Bedminster, NJ    Wireless telephone services   
                            Virginia Beach,
Vianix                      VA                Audio compression                       (757)321-9971;
Virage SoftSound            Cambridge, UK     Audio search technology                 +44 1223 448000;
                                              Market analysis and strategy firm for   +44 845 003 8742;;
VisionMobile                London, UK        the mobile industry           
                                              Voice development tools and
Voice Web Solutions         Seattle, WA       applications                            (206) 338-6632;
VoiceCorp                   Sweden            Text-to-voice web service               +46 18 60 44 40;
                                              Voice verification technology and
VoiceVault                  Dublin, Ireland   service                                 +353 1 603 9500;
VoiceXML Forum              New York, NY      Voice eXtensible Markup Language
                                              Directory Assistance, Mobility and
                                              Telecom solutions, Data and Media
Volt Delta Resources,                         Services, Hosted Contact Center
LLC                         New York, NY      Solutions and IT Outsourcing            (714)921-8000;
Volt Information                              Workforce, information, and
Sciences, Inc.              New York, NY      telecommunications solutions  
Voxeo                       Orlando, FL       Voice hosting solutions                 (407)418-1800;
West Corporation            Omaha, NE         Outsourced communication solutions
West Interactive (unit of                     Out-sourcing of customer contact
West Corp.)                 Omaha, NE         solutions                               (402)963-1300;
Wizzard Software                              Speech technology application
Corporation                 Pittsburgh, PA    development and licensing               (954)678-4155;
Yap, Inc.                   Charlotte, NC     Speech-to-text on mobile phones         (704)372-1470;
                                                                                      +86 10 5913 7700;
Yucheng Technologies        Bejing, China     Banking industry solutions    
Speech Strategy News                                                       September 2009                                                    56

                                                       Voice Search 2010
                                           Third edition of breakthrough conference
    The Applied Voice Input/Output Society (AVIOS) and Bill Meisel’s TMA Associates are organizers of
the third annual Voice Search Conference, to be held April 22-23, 2010, at the Hyatt Fisherman’s Wharf in
San Francisco, California. For more information, see

                                                     A book from TMA Associates:
    VUI Visions: Expert views on Effective Voice User Interface Design
                                          William Meisel, Editor
    A collection of articles by over thirty experts in the design of dialog for speech recognition,
text-to-speech, and speaker verification user interfaces.

■ I wish to subscribe to Speech Strategy News for one year (12 issues), payable in US$ on US bank—
         US$                     Individual rates                              Corporate rates*
                  Paper+Web+         Web+Archives        Paper    Paper+Web          Web+          Paper only
 Destination        Archives**                            only   + Archives**      Archives**      (5 copies)
      U.S. &            ■ $495             ■ $425        ■ $425 ■ $1,895*         ■ $1,495*        ■ $1,495
International           ■ $525             ■ $425        ■ $480 ■ $1,895*         ■ $1,495*        ■ $1,495
* Corporate subscriptions: Unlimited users within a corporation for Web+Archives; one paper copy mailed
   in addition if Paper+Web+Archives elected. Five paper copies, sent separately or as a group, if Paper Only
** Web + Archives: Searchable current and back issues. Issue on Web can be printed.
■ I wish to receive selected news alerts (less than 2 per month) by email at no additional cost. Email address
       for alerts (available with all subscription categories):
■ Please send information on your consulting.

Name:                                                                      ■ Check enclosed, payable to TMA Associates
Company:                                                                      (in U.S. $ on a U.S. bank).
Address:                                                                   ■ Invoice me.
                                                                           ■ Charge my—
City, State                                                                   ■ Visa ■ MasterCard ■ American Express
ZIP/Postal code
                                                         Card #
                                                         Expiration date:
Email (required for email alerts or a Web subscription):

Copyright TMA Associates 2009; All rights reserved.                                                                                       SSN195
Mail or fax orders to: TMA Associates, P.O. Box 570308, Tarzana, CA 91357-0308 USA. Tel: (818) 708-
0962. Fax: (818) 345-2980, or register on web site:
Speech Strategy News is published twelve times per year by TMA Associates, Editor: William S. Meisel. Trademarks mentioned in this publication
are the property of the companies mentioned; they are used editorially. The material herein is based on data from sources believed to be reliable,
but is not guaranteed as to accuracy and does not purport to be complete. From time to time, the author or TMA Associates may have consulting
assignments, advisory positions, own stock, or have other business relations with organizations in speech recognition and associated areas,
including companies discussed in this newsletter. Speech Strategy News is a trademark of TMA Associates.

To top