"Portable Speech Recognition for the Speech and Hearing Impaired"
Portable Speech Recognition for the Speech and Hearing Impaired Y. A. Alsaka",Steve Doll', and Stephen Davis *University of North Florida Jacksonville, F'L 32224 e m i yalsaka&unf.edu -al Ab&& - Thia paper describes the design and vibrations. The proposed device will operate on a implementationof a portable, calculator size, speech W m t principle and will generate perfectly natural recogDitionsystem for use by the speech and hearing sound. impaired individuals. SYSTEM DESIGN CONSIDERQTION INTRODUCTION HardwareDevelopment: Figure 1 presents a block Communication via speech is the most natural diagram of the proposed system. Each subsystem can audefficientmodeofannmunicationbetween humansor easily be implemented using existing off-the-shelf humans and machines. A major effort is currently under components. This is among many goals and way to enablemachines such as personal computers and specifications the system should adhere to . Another thelike tounderstandordinary human speech. Although important goal is to have a highly robust and cost- the problem has not been solved completely, a great deal effective speech mgnition system. Fortunately, this can of progress has been achieved. easily be satislied by a number of systems that already existsunthemarket Additidy,manyofthesesystems Nowadays, many swcesfd commercial allow for umcurrent speaker independent and speaker proauctsexists for recognizingisolated words in a lmtd iie dependent recogDition .f- This fits well with the set of vocabulary with up to 99% rec~gnitim accuracy. ay targeted population at hand. M n of the intended users However, cmtinuous speech recognition continues to be have partial or little umtrol on their sound production illusive with partial success of few mostly research organs and require the speaker dependent feature while s- y- normalspeaken xquire the speaker independent facility. Isolated word, speaker dependent speech recognition systems are very suwessM with high et N x ,portability is a critical feature and will most likely accuracy. However, these systems are l m t d in their iie determine whether the device will be s u c c e d or not. applications. A more popular and practical version is the CMOS tedmologyfacilitatesthe design and c " c t i c m i o a e w r , speaker dependantsystems similar to the sltd o d of a hand held calculator size devise to carry out the one employed by the phone companies for recognizing desired and intended function. The &d product should digits. This SUCC~SS stimulatedmuch development in has besnallenough andcouldbe mcealedifone so desire. manyengineeringareassuch assoftKarealgorithmsand t e rimplementation on VLSI or ASIC d u m b t o r s . hi el Pmxsing speed is of paramount importance. R a time Presently many companies such as OKI offer speech pmcesshg is a must and recognition must be completed recognition processors and other related speech with little noticeable delay. Many DSP and voice recognition chips [11. These new devices can be easily recognition processors can accomplish this t s ak and economically incorporated as part of a system to adequately. facilitateman machine co"unic8tion via speech. Largeonboard memory is needed t store audio filesfor o Although one can t i k ofmany applications for hn input and playback. Sampling speed of 8KHz will be thesechips,mt i papex we will discuss the development hs used which requires 8 K bytes storage per. For long of a hand held device that will enable the speech and sdences of 4 to 5 seconds an input bui€erof about 5OK hearing impaired to communicate via anal speech. bytes are needed. On the output side the audio files Mauy ~ u s e d e v i c e s t create sounds that are o storage requirements will depend on the user's intelligible but sounds unnatural and machine like. In individual preferences and desires. Also, Vocabulary these devices the primary method for sound generation utterance templates and the application sohare will consists of modulating a noise source by the larynx wall require additional memory space. Since these files are -151- 0-7803-3844-8/97/$10.00 0 1997 IEEE not modified often, flashmemoriescanwe usedto b SWTK4RE DJTELOP%~ENT:The programs required satisfy the large memory required far these audio files. to run the hardware , although laborious, is stmight Following is a brief description of the individual forward However, one of the most hportant features subsystemsof this device as depicted in figure 1: and critical areas of the device development is the user interfase sohare. The device success is largely Microcontroller:Will act as the CPU for the device. dependant on the reliable operation and a user fkiendy interface. The user interface has been written and is VoiceRecognition : The processor used for this function running in a desktop environment. A preliminary study is the OKI MSM6679 voice recognition processor. This was conducted wt M a n d speech impaired subjects. ih chip performs five major ~ c t i o n s including sound The initial results were very encouraging. recording, sound playback, speaker dependent and speaker independentrecognition, and speech synthesis. The desktop e " m e n t is utilized mostly to Also, the chip pvides on-chip memory controlleryflash develop the recognition templates that will be loaded in memory interface, OKI speech synthesizer interface,and the portable device. Multiple vocabulary sets of 61 PWM doutput. A l these facilitiesare integrated on l words each canbe stored under different categories for one CMOS chip t a C O I ~ I S U ~ ~ S as lOuA of current ht 8s low different application and sitting. Each set of words is and a of 144 mA [a]. This chip is ideal for chosen by the user. The system is trained to recognize batterybasedportabledevicesastheonebeingdiscussed. these words with accu~8cyapproaching 100%. Al l The analog input is connected to the microphone o the r training is performed at the desktop environment phone line interface. "%e portable device is not designed to develop Voice Synthesis: This module is largely implemented vocabulary templates. Its library of templates can ke using the OKI MSM6650 voice synthesis chip. The chip updated or mcxEed though a serial interface with the supports ADPCM and PCM speech, 12 bit D/A, a built desktop unit This restricts the portable unit to in adaptive low pass filter, two channel mixing and echo recognition, display, and speech synthesis. It allows for generatiion. Again the chip is very well suited for battery- user control through a number of dedicated keys or operated device since it requires as low as lOuA to a buttons. For deafindividualsthe display section performs m a x i " of 10 mA of current . The MSM6650 can i two f "On the input side it displays the recognized addh-essup to 8Mbyte of extemal memoxy. The MM6650 input speech for the deaf user. On the output side the presents the output speech either to a preamplifier to display provide a double check of the intended output drive a speaker on the device, or to the phone line s e c This is specially usefid for situation requiring peh interface. accuracy such as banking. Phone line inteflace: The device can a s be used over lo Presdytheprocessorusedallowsfor41word the telephone h e . It is designed such that it could templates at one t m . So the search space is limited to ie replace a telephone handset. The input analog signal 61 words at any t m . This constraints " a i l y affects ie ii-omt e phone line is c ~ ~ w tto the voice recognition h ed the ability of the user to output speech. However, it is processor the MSM6679, while the analog output from significant for recognizing the input speech. This thevoice synthesisMSM6650 is connectedto the phone constraint requires careful design of the recognition line effmtively replacing the mouth piece on the phone t e m p h sets,ie. what wwds should be grouped together set. mevery set. Also, the user have the flexibility to control WhiGh set of word templatesi active. So it is possible to s Speaker4Uicrophone and LCD display: In face-to-face load on the fly different sets of templates that best fit the operation the device sets in one's hand o in his sit r hr situation at hand. This will require a certain amount of pocket. B e microphone, speaker and LCD @lay with trainiug and experience on the part of the user. In the terinterface circuitry are assembled in a calculator like hi ~ J I this c " i n t ~ &T is expected to vanish with the package. The function keys, which are similar to a devekqment of recognition processors with a large set of calculator keys, provide the user with various options hs words. At t i time an automatic switchingfeature can such as displaying the output message or switching the be included were the input words are screened to recognition templates, etc. determine the appropriateset of words. Finally the serial interface provides the facility to CONCLUSION interfacewith a desktop development station for various updates or s o h a r e mo8iffcations. Portable battery qerated speech recognition devices are very much a realty. A multitude of applications can be -152- a-bsucll-. Thedevice~discussian can easily be modifiedfor use for language translation, wsaPhouseinventolycantrol of other applicatiaaswhere a speech input is better suited than other types of man- machine inteaface. [I 1. Y.A. A& I and S Davis, “Portable Computer . Based Language Translation,” P . ” Southcon’95, PP.141-146,Ft. Lauderdale,FL. March7,1995. . OKI Semiconductor, “MSM6679Voice Recognition Prooessor,” Data sheet, Feb. 19%. . OKI Semiconductor, “MSM6650 Family Speech Synthesis”Family Data Sheet, April 1995. I m-m Figure 1. System Block Diagram -153-