Friday, October 8, 2004
“Listen up!” Speech Recognition’s Impact on Communication,
Rhetoric, and Interface
“I helped Apple wreck a nice beach” 1. The T-shirt says it all: Researchers at Apple
Computer learned the hard way that making computers transcribe human speech correctly
(recognize speech, not wreck a nice beach) is a difficult task. Today, many computer
systems have been equipped with at least some rudimentary speech recognition
capabilities, but none of these interfaces has met with enough success to gain widespread
acceptance or use. Many companies, including Merrill Lynch and British Airways, have
implemented telephone answering machines that process basic requests by automatic
speech recognition.2 In addition, commercial speech recognition software, such as IBM’s
ViaVoice application,3 is readily available and is becoming increasingly popular.
Nevertheless, the vast majority of computer users today still rely on traditional input
devices – mice and keyboards – and either speak to a service agent or use touch-tone
menus when calling a corporation. In this research project, I will examine the effects of
automatic speech recognition (ASR) on today’s communication and rhetoric, exploring
how it has – or has not – changed the way humans and computers interact. I will also
investigate the future directions of speech recognition, focusing on ways in which it could
affect communication within the coming years.
Many users who have tested speech recognition systems – from casual web surfers to
corporations seeking to raise productivity levels – have been dissatisfied with their
experiences. Others, however, find speech recognition interfaces indispensable aids for
accomplishing basic tasks. Basic, introductory questions to the field of speech
recognition include: What is the current state of the art in speech recognition? How fast,
and how accurate, are today’s software programs? How much initial “training” do they
require before they can function at levels acceptable to their users? Are they equally
effective when recognizing the voice of one speaker as they are for a wide range of
After establishing these basic topical foundations, I will move into inquiries that more
directly address the heart of my research project. Have users’ experiences with
commercial speech recognition programs been positive, negative, or neutral? Have
experiences with phone-based speech processing systems (which typically rely on smaller
“vocabularies,” and are thus more accurate) been better or worse? In what ways, if any,
have they changed the processes and outcomes of speaking, thinking, or writing? How
do these systems affect our moods and emotions – for example, do we become more
irritable when software incorrectly transcribes our speech? Would dissatisfied users be
content to use speech recognition exclusively if the technology improves, or will they
always prefer traditional interfaces?
To provide a contextual and historical background for my project, I will rely primarily
on traditional sources, such as books, magazine articles, and journals. Some of these
resources that appear to hold promise include Ben Shneiderman’s Designing the user
interface: strategies for effective human-computer interaction, Kim J. Vicente’s The
human factor: revolutionizing the way people live with technology, and Michael
McTear’s Spoken dialogue technology: towards the conversational user interface. In
addition, I will consult electronic databases such as Expanded Academic, EBSCO, and
IEEE Xplore to find the latest news and reports about speech recognition. A preliminary
search of EBSCO, for instance, yielded almost 2,500 results; while many that I browsed
catered to a highly technical audience, others (such as “Pronunciation change in
conversational speech and its implications for automatic speech recognition”) seemed
very relevant to this project.
In addressing the questions that are directly related to the core of my research, I will
generally employ less traditional – and more original – sources of information. For
example, I will create an anonymous survey about users’ experiences with speech
recognition, post it to my website, and encourage as many individuals as I can to fill it
out. Additionally, I will interview faculty on campus, such as Professor Vaughan Pratt or
Professor Dan Jurafsky, who are leaders in the field of automatic speech recognition.
These expert analyses should provide a good balance with survey responses, as end users
and computer scientists may each contribute different biases in their opinions of speech
This schedule should provide ample flexibility while simultaneously meeting the
deadlines outlined by the course syllabus.
By this date… I will…
Friday, October 15 Translate this document into hypertext, and compile an initial
list of sources.
Wednesday, October 20 Read and explore enough initial material to finalize my list of
“conventional” print and electronic sources.
Friday, October 22 Prepare a list of questions to use during an interview with a
Stanford professor working in the field of speech recognition.
Create and post to my website the survey on experiences with
speech recognition described previously; begin publicizing this
Monday, October 25
survey (through PWR and dorm email lists, with friends, and so
on). Also, progress further researching historical and cultural
contexts of speech recognition.
Hold my interview with one or more professors; continue
Friday, October 29
reading and researching contextual frameworks throughout the
week. Transcribe interviews over the weekend.
During the week, finalize basic contextual research and convert
it to an aesthetically appealing electronic form. Post two or more
Friday, November 5
nodes of the project on my homepage as required by the syllabus,
containing introductory material and preliminary contextual
Post four or more nodes on my project site, as required; these
Monday, November 8
nodes will consist of any additional historical/contextual research,
plus an examination of the results from my survey and interviews.
Post the final three (or more) nodes on my project site; these
Friday, November 12 pages will comprise all remaining survey- and interview-related
materials, as well as a concluding discussion and analysis of the
To complete the draft of my hypertext, I will finalize any links
Monday, November 15
between nodes and any aesthetic or graphical issues omitted
during the creation of the nodes.
Post the final draft of my project, drawing on feedback from
Wednesday, December 1 my own critical review of the site, peer commentary, and
suggestions from Professor Alfano and other sources (such as
the Stanford Writing Center).
Though the field of research has not yet seen widespread integration into our everyday
lives, automatic speech recognition has the potential to transform today’s human-
computer interfaces, making our machines more intuitive, productive, and user-friendly.
In the process, this technology would revolutionize many forms of modern electronic
rhetoric, changing the ways in which we write and speak in this digital age. Speech
recognition has moved beyond a mere scientific curiosity to a powerful force that will
shape our communication – and how we conceptualize communication – in the coming
decades. It is important that we as a society be prepared for the arrival of such a potent
technology, in order to maximize its benefits and effectively meet the new challenges it
Ryan Propper is currently a sophomore at Stanford University
and has been an avid computer user ever since receiving a state-of-
the-art Apple IIgs machine at the age of four. His interest in
speech recognition stems both from a desire to improve the way in
which humans and computers interact, and ongoing research he is
performing with the Stanford Computer Science Department.
Ryan is excited to explore the more “human” side of speech recognition technologies and
investigate the ways in which they could enrich and enhance our perspectives on
communication in the 21st century.
1. WordIQ.com. Definition of Speech recognition [Internet]. Available from:
2. TMA Associates. From Speech Recognition Update #120 [Internet]. June 2003.
Available from: http://www.tmaa.com/news120.htm
3. IBM. IBM ViaVoice – Product Overview [Internet]. Available from: