Portable Speech Recognition for the Speech and Hearing Impaired by josephvincent1990


									          Portable Speech Recognition for the Speech and Hearing Impaired
                               Y. A. Alsaka",Steve Doll', and Stephen Davis
                                             *University of North Florida
                                               Jacksonville, F'L 32224
                                               e m i yalsaka&unf.edu

Ab&&      - Thia paper describes the design and                vibrations. The proposed device will operate on a
implementationof a portable, calculator size, speech           W m t principle and will generate perfectly natural
recogDitionsystem for use by the speech and hearing            sound.
impaired individuals.
                                                               SYSTEM DESIGN CONSIDERQTION
                                                               HardwareDevelopment: Figure 1 presents a block
          Communication via speech is the most natural         diagram of the proposed system. Each subsystem can
audefficientmodeofannmunicationbetween       humansor          easily be implemented using existing off-the-shelf
humans and machines. A major effort is currently under         components.       This is among many goals and
way to enablemachines such as personal computers and           specifications the system should adhere to [2]. Another
thelike tounderstandordinary human speech. Although            important goal is to have a highly robust and cost-
the problem has not been solved completely, a great deal       effective speech mgnition system. Fortunately, this can
of progress has been achieved.                                 easily be satislied by a number of systems that already
                                                               existsunthemarket Additidy,manyofthesesystems
         Nowadays, many swcesfd commercial                     allow for umcurrent speaker independent and speaker
proauctsexists for recognizingisolated words in a lmtd
                                                   iie         dependent recogDition .f-         This fits well with the
set of vocabulary with up to 99% rec~gnitim    accuracy.                                      ay
                                                               targeted population at hand. M n of the intended users
However, cmtinuous speech recognition continues to be          have partial or little umtrol on their sound production
illusive with partial success of few mostly research           organs and require the speaker dependent feature while
 y-                                                            normalspeaken xquire the speaker independent facility.

          Isolated word, speaker dependent speech
recognition systems are very suwessM with high                  et
                                                               N x ,portability is a critical feature and will most likely
accuracy. However, these systems are l m t d in their
                                         iie                   determine whether the device will be s u c c e d or not.
applications. A more popular and practical version is the      CMOS tedmologyfacilitatesthe design and c " c t i c m
i o a e w r , speaker dependantsystems similar to the
 sltd o d                                                      of a hand held calculator size devise to carry out the
one employed by the phone companies for recognizing            desired and intended function. The &d      product should
digits. This SUCC~SS stimulatedmuch development in
                   has                                         besnallenough andcouldbe mcealedifone so desire.
manyengineeringareassuch assoftKarealgorithmsand
t e rimplementation on VLSI or ASIC d u m b t o r s .
 hi                                                                                                         el
                                                               Pmxsing speed is of paramount importance. R a time
Presently many companies such as OKI offer speech              pmcesshg is a must and recognition must be completed
recognition processors and other related speech                with little noticeable delay. Many DSP and voice
recognition chips [11. These new devices can be easily         recognition processors can accomplish this t s   ak
and economically incorporated as part of a system to           adequately.
facilitateman machine co"unic8tion via speech.
                                                               Largeonboard memory is needed t store audio filesfor
          Although one can t i k ofmany applications for
                            hn                                 input and playback. Sampling speed of 8KHz will be
thesechips,mt i papex we will discuss the development
                hs                                             used which requires 8 K bytes storage per. For long
of a hand held device that will enable the speech and          sdences of 4 to 5 seconds an input bui€erof about 5OK
hearing impaired to communicate via anal speech.               bytes are needed. On the output side the audio files
Mauy ~ u s e d e v i c e s t create sounds that are
                                   o                           storage requirements will depend on the user's
intelligible but sounds unnatural and machine like. In         individual preferences and desires. Also, Vocabulary
these devices the primary method for sound generation          utterance templates and the application sohare will
consists of modulating a noise source by the larynx wall       require additional memory space. Since these files are

                                                            -151-                   0-7803-3844-8/97/$10.00 0 1997 IEEE
not modified often, flashmemoriescanwe usedto
                                     b                             SWTK4RE DJTELOP%~ENT:The programs required
satisfy the large memory required far these audio files.           to run the hardware , although laborious, is stmight
Following is a brief description of the individual                 forward However, one of the most hportant features
subsystemsof this device as depicted in figure 1:                  and critical areas of the device development is the user
                                                                   interfase sohare. The device success is largely
Microcontroller:Will act as the CPU for the device.                dependant on the reliable operation and a user fkiendy
                                                                   interface. The user interface has been written and is
VoiceRecognition : The processor used for this function            running in a desktop environment. A preliminary study
is the OKI MSM6679 voice recognition processor. This               was conducted wt M a n d speech impaired subjects.
chip performs five major ~ c t i o n s  including sound            The initial results were very encouraging.
recording, sound playback, speaker dependent and
speaker independentrecognition, and speech synthesis.                        The desktop e " m e n t is utilized mostly to
Also, the chip pvides on-chip memory controlleryflash              develop the recognition templates that will be loaded in
memory interface, OKI speech synthesizer interface,and             the portable device. Multiple vocabulary sets of 61
PWM doutput. A l these facilitiesare integrated on
                      l                                            words each canbe stored under different categories for
one CMOS chip t a C O I ~ I S U ~ ~ S as lOuA of current
                 ht             8s low                             different application and sitting. Each set of words is
and a             of 144 mA [a]. This chip is ideal for            chosen by the user. The system is trained to recognize
batterybasedportabledevicesastheonebeingdiscussed.                 these words with accu~8cyapproaching 100%. Al        l
The analog input is connected to the microphone o the
                                                   r               training is performed at the desktop environment
phone line interface.
                                                                             "%e portable device is not designed to develop
Voice Synthesis: This module is largely implemented                vocabulary templates. Its library of templates can ke
using the OKI MSM6650 voice synthesis chip. The chip               updated or mcxEed though a serial interface with the
supports ADPCM and PCM speech, 12 bit D/A, a built                 desktop unit This restricts the portable unit to
in adaptive low pass filter, two channel mixing and echo           recognition, display, and speech synthesis. It allows for
generatiion. Again the chip is very well suited for battery-       user control through a number of dedicated keys or
operated device since it requires as low as lOuA to a              buttons. For deafindividualsthe display section performs
m a x i " of 10 mA of current [3]. The MSM6650 can                        i
                                                                   two f "On the input side it displays the recognized
addh-essup to 8Mbyte of extemal memoxy. The MM6650                 input speech for the deaf user. On the output side the
presents the output speech either to a preamplifier to             display provide a double check of the intended output
drive a speaker on the device, or to the phone line                s e c This is specially usefid for situation requiring
interface.                                                         accuracy such as banking.

Phone line inteflace: The device can a s be used over
                                        lo                                    Presdytheprocessorusedallowsfor41word
the telephone h e . It is designed such that it could              templates at one t m . So the search space is limited to
replace a telephone handset. The input analog signal               61 words at any t m . This constraints " a i l y affects
ii-omt e phone line is c ~ ~ w tto the voice recognition
      h                          ed                                the ability of the user to output speech. However, it is
processor the MSM6679, while the analog output from                significant for recognizing the input speech. This
thevoice synthesisMSM6650 is connectedto the phone                 constraint requires careful design of the recognition
line effmtively replacing the mouth piece on the phone             t e m p h sets,ie. what wwds should be grouped together
set.                                                               mevery set. Also, the user have the flexibility to control
                                                                   WhiGh set of word templatesi active. So it is possible to
Speaker4Uicrophone and LCD display: In face-to-face                load on the fly different sets of templates that best fit the
operation the device sets in one's hand o in his sit
                                           r         hr            situation at hand. This will require a certain amount of
pocket. B e microphone, speaker and LCD @lay with                  trainiug and experience on the part of the user. In the
terinterface circuitry are assembled in a calculator like
 hi                                                                ~ J I this c " i n t
                                                                     ~ &T                   is expected to vanish with the
package. The function keys, which are similar to a                 devekqment of recognition processors with a large set of
calculator keys, provide the user with various options                           hs
                                                                   words. At t i time an automatic switchingfeature can
such as displaying the output message or switching the             be included were the input words are screened to
recognition templates, etc.                                        determine the appropriateset of words.

Finally the serial interface provides the facility to              CONCLUSION
interfacewith a desktop development station for various
updates or s o h a r e mo8iffcations.                              Portable battery qerated speech recognition devices are
                                                                   very much a realty. A multitude of applications can be

a-bsucll-.                 Thedevice~discussian
can easily be modifiedfor use for language translation,
wsaPhouseinventolycantrol of other applicatiaaswhere
a speech input is better suited than other types of man-
machine inteaface.

[I 1. Y.A. A&
            I     and S Davis, “Portable Computer
      Based Language Translation,” P .  ”
      Southcon’95, PP.141-146,Ft. Lauderdale,FL.
[2]. OKI Semiconductor, “MSM6679Voice
    Recognition Prooessor,” Data sheet, Feb. 19%.

[3]. OKI Semiconductor, “MSM6650 Family Speech
     Synthesis”Family Data Sheet, April 1995.

                                                                      I      m-m

                                      Figure 1.       System Block Diagram


To top