Voice Activated Function Keyboard (VAFK) with a USB Interface Final Report Fall Semester 1999 by Jeff Boody Greg Milburn Mark Jurjovec Prepared to partially fulfill the requirements for EE401 Department of Electrical Engineering Colorado State University Fort Collins, Colorado 80523 Report Approved _______________________________ Project Advisor 1 Abstract The computer industry revolves around fast changing technology which strives to improve performance, reliability and usability. This project is designed to combine several new technologies to solve some major problems in the computer industry. One problem in the computer industry is in implementing an effective speech recognition system on personal computers. Current speech recognition systems typically use large software programs which can use significant system resources to get acceptable results. Another problem in the computer industry is the connection of peripheral devices to personal computers. Current peripheral connection methods are difficult to configure and also difficult to design. We are attempting to solve these problems by using new technologies from the speech recognition and communications industries. Our approach to solving the complex software speech recognition system is to use a generic speech recognition chip (Sensory Voice Direct chip) to recognize words. Our current design is suitable for implementing a small set of commands (~15). Our approach to solving the peripheral communication problem relies on a new communication protocol called the Universal Serial Bus (USB). We will be using a generic USB chip (Cypress CY7C63000A) which can be programmed easily by using assembly code. These two new technologies allow us to design a powerful peripheral device that is very simple and easy to use. These speech recognition and USB technologies are proving to be powerful tools in the computer industry. The use of speech recognition in computers has the potential of significant productivity gains. Computers will require speech recognition systems that are speaker independent and support large vocabularies. The speech recognition chip technology is available to implement this type of a system on computers. The USB technology is proving to be an excellent alternative for many peripheral devices. Our project shows how the USB technology can be used to simplify a design and make our product easier to use. 2 TABLE OF CONTENTS Title 0 Abstract 1 Contents 2 List of Figures 3 I. Introduction 4 II. Background of Speech Recognition Systems 7 III. Background of the Universal Serial Bus (USB) 9 IV. Speech Recognition Module 14 V. Voice Direct/USB Circuit Interface 17 VI. USB Circuit Results 23 VII. Conclusion 27 VIII. References 29 3 List of Figures and Tables Figure 1 VAFK Commands 5 Figure 2 USB Topology 10 Figure 3 Speech Recognition Pin Diagram 14 Figure 4 Voice Direct/USB Circuit Interface Signals 17 Figure 5 Block Diagram of Voice Direct/USB Circuit Interface 19 Figure 6 USB Scan Codes for VAFK commands 20 Figure 7 State Diagram for Interrupt Signal 22 Figure 8 USB Chip Comparison 23 Figure 9 USB Firmware Flowchart 25 4 Introduction: Voice Activated Function Keyboard (VAFK) with a USB Interface Bill Gates of Microsoft recently said that “speech is not just the future of Windows, it’s the future of computers.” The Voice Activated Function Keyboard is a step in that direction. The VAFK, will allow users to navigate Windows applications through various predetermined voice activated commands. This is opposed to using the traditional confusing menu bars or complex keystrokes. Hence, by utilizing the power of speech recognition the VAFK makes Windows applications quicker and easier to navigate. The Voice Activated Function Keyboard is then in essence a glimpse into the future by providing Windows users the convenience of speech recognition today. The VAFK is designed to be a stand alone peripheral that takes advantage of the voice-user interface as a way to make software more usable. The VAFK is a speaker dependent device. This means that it will only recognize the speaker who programmed it. In order to prevent spurious speech or background noise from generating unwanted functions the VAFK employs a push-to-talk key. The push-to-talk key, when depressed, activates the system which then prompts the user for a command. This type of implementation has two advantages over continuous listening systems. The first advantage is that it insures that the device only listens to what the user intends. The second main advantage is that the VAFK does not require valuable processing power when it is not being used. Initially the VAFK will be designed to recognize only a limited number of commands. The prototype commands are listed in the table below along with their corresponding key equivalents. 5 Windows NT Command Key Equivalent Close Alt - F4 Task Switch Alt - Tab Copy Ctrl - c Paste Ctrl - v Select All Ctrl - a Start Menu Ctrl - Esc Security Menu Ctrl - Alt - Esc Help F1 Figure 1: VAFK Commands The VAFK is realized in hardware with three distinct stages: speech recognition, the speech circuit to USB circuit interface and the USB. The first stage consists of, Sensory’s VOICE DIRECT, speech recognition module. The VOICE DIRECT module maps spoken commands to system control functions. The second stage then translates these control functions to their appropriate USB keyboard scan codes which are then interfaced with the USB circuit. By utilizing USB technology the VAFK provides the convenience of being a plug and play peripheral. Most PC’s, including notebooks, on the market today are fully USB-ready. The USB circuit is primarily firmware located on a USB chip (Cypress CY7C63000A) with a minimal amount of additional hardware. The USB circuit is responsible for configuring the chip as a keyboard and transmitting / receiving data. Detailed descriptions and implementations of each stage follows in subsequent chapters. The Voice Activated Function Keyboard, is like adding instant capabilities to one’s PC. There have been other attempts to add speech recognition to PCs however these attempts have had limited success. Recently, Apple Computer, Inc. released their new operating system, Mac OS 9, which supports voice control. The Apple system provides many of the same features as the VAFK including the push-to-talk key option. However, since the Mac OS 9’s speech recognition features are actually embedded within 6 the operating system itself these features can only be used on Apple Computer hardware. One disadvantage of this speech recognition method is that since it is implemented in software, it may consume significant system resources. The VAFK on the other-hand can be installed on any PC which supports USB peripherals, including the Mac OS 9, and it will not consume system resources. 7 Chapter 2: Background of Speech Recognition Systems Speech recognition is the process by which a computer maps an acoustic speech signal to text or other signals that are understood by the computer. Some of the current uses of speech recognition include dictation machines and command input devices. There are several different forms of speech recognition which may vary in complexity and styles. One variation on speech recognition is the type of speaker that the system is designed to understand. A speaker dependent system is developed to operate for a single user. This method generally requires the user to “train” the system by repeating words which the system then stores into a database. The advantages of this system is that it is easier to develop, cheaper to buy and it is also more accurate. The disadvantage is that it is not as flexible as speaker adaptive and speaker independent systems. Speaker independent systems are developed to operate for any speaker of a particular type (i.e. American English). However, this system is the most difficult to develop, it is more expensive and it has a lower accuracy than speaker dependent systems. A speaker adaptive system is designed to adapt its operation to the characteristics of new speakers and it’s design difficulty lies between the speaker dependent and independent systems. Another variation in the speech recognition systems is the size of their vocabulary. The vocabulary sizes range from “small” (tens of words) to “medium” (hundreds of words) to “large” (thousands of words) and also “very-large” (tens of thousands of words). The size of the vocabulary can significantly affect the complexity, processing requirements and the accuracy of the speech recognition system. One application of a system that has 8 a small vocabulary is a toy (i.e. robot) which has a speech recognition chip (these are available commercially from Sensory, Inc.). The robot can accept commands such as “GO”, “Turn” and “Stop.” A typical application of a very-large vocabulary would be dictation machines which “type as you speak.” The final variation on the complexity of the system is the manner words are received from the user. Some systems allow for continuous speech while others only allow isolated-words. An isolated-word recognition system only operates on a single word at a time or by requiring a pause between each word. This is the simplest method of recognition to implement because it is easier to find the end points of a word. There is also less interference caused by other words that are spoken in the same sentence. A continuous speech recognition system is designed to operate when words can be spoken together. Continuous speech is more difficult to handle because of several effects. When words are spoken together it is more difficult to determine where the first word ends and the second word begins. Another problem called “coarticulation” also occurs when words are spoken together. “Coarticulation” is when the phonemes (basic sounds of spoken languages) of one word is affected by the surrounding words. The main variations in speech recognition systems is the type of speaker they are designed for: speaker dependent, speaker independent and adaptive systems. The size of vocabulary also varies with each system and finally some systems allow for continuous speech while others only allow for isolated-word speech. Speech recognition is currently a technology that is developing rapidly. 9 Chapter 3: Background of the Universal Serial Bus (USB) The Universal Serial Bus (USB) technology is a relatively new communication standard for computer peripherals. The USB standard was designed to handle increasing demands for easy-to-use products which could be implemented in a flexible and cost effective method. The USB handles these demands through plug-and-play, a standard device framework, automatic error handling, standardized hardware and a wide variety of data transfer types. Plug-and-play allows a peripheral device to be configured easily and automatically. When the USB host detects attachments or detachments of peripherals it loads appropriate software drivers and it determines the bus protocol that the device will be using. Since these steps happen automatically, computer users no longer need to worry about selecting the right serial port, installing expansion cards, or the technical headaches of dip switches, jumpers, software drivers, IRQ settings, DMA channels and I/O addresses. Operating systems which support the USB have many standard software drivers built in for the USB devices. Since the drivers are standardized, users will not need to install software when they purchase new hardware. Another plug-and-play feature is that the USB can distribute power on the bus, so many peripheral products no longer require separate power supplies. Plug and Play is an industry-wide specification that makes it easy to expand PC functionality. The USB bus is organized in a "tiered star topology" which means that some USB devices, called USB "hubs", can serve as connection ports for other USB peripherals. Only one device needs to be plugged into the PC. Other devices can then be plugged into the hub (See diagram below). 10 Figure 2: USB Topology (figure from USB Specification) Hubs play an integral role in expanding the world of the PC user. Since hubs can be embedded in devices such as keyboards and monitors, users don’t have to worry about expanding their computers. A hub consists of two portions: the Hub Controller and the Hub Repeater. The Hub Repeater is a protocol-controlled switch between the upstream port and downstream ports. The Host Controller provides the interface registers to allow communication to/from the host. Hub-specific status and control commands permit the host to configure a hub and to monitor and control its ports. The USB host consists of specialized hardware located on a computer motherboard and software which is usually provided by the operating system. The host is responsible for the detection of attachment and removal of USB devices, managing control flow between the host and USB devices, managing data flow between the host and USB devices and providing power to attached USB devices. Another component of the bus topology is the cable which connects the devices using four wires (VDD, GND, D+, D-). These wires are power, ground, and the differential data drivers. To minimize end user termination problems, USB uses a “keyed connector” protocol. The physical difference in the Series “A” and “B” connectors insure 11 proper end user connectivity. The “A” connector is designed to connect to upstream hubs. All USB devices must have an “A” connector. The “B” connector connects to the USB device which allows vendors to provide detachable cables. There are two types of connectors to eliminate illegal loopback connections at hubs. The device framework of the USB can be broken down into several categories. The device “enumeration” process (configuration), request processing and the bus protocol. The enumeration process begins as soon as you plug the USB device into a hub. The device must go through several stages before it is enabled for use. When a hub detects that a new device has been attached, it sends a reset signal to the device and it requests a unique address from the host. When the reset is complete, the device is accessible by sending data to the default address. At this point the device is also powered by the USB bus although it can only draw up to 100mA. Next the host assigns the device a new unique address which frees the default address for the next USB device to be attached. In the final step, the host reads the configuration information (device descriptors) which is stored on the devices ROM. The USB bus has a standardized protocol which enables devices to communicate with the host. The protocol specifies how “packets” of data are transferred between the computer and the device. There are three main types of packets: Token, Data and Handshake. Token packets are used to indicate IN (device to host), OUT (host to device) and SETUP packets (configures the device). Each Token packet also contains address and endpoint fields. The Handshake packets are used to indicate ACK (receiver accepts an error-free data packet), NAK (device cannot receive or transmit data) and STALL (device has been halted or the request is not supported). Each packet also contains Cyclic 12 Redundancy Check (CRCs – error checking) fields. The CRC for each of the above packet IDs is just a repetition of the packet ID value (i.e. ACK = 0010B, ACK+CRC = 00100010B). If an error is found in the packet ID, then the rest of the packet is ignored and no handshake is returned. The address and endpoint portions of the packet contain their own CRC. The address and endpoint fields contain 11 bits plus 5 bits for the CRC. The CRC is capable of detecting and correcting up to 2 errors in the address and endpoint. Finally there is also a CRC field for the actual Data that is being sent. The Data packet can contain from 0 to 1023 bytes, which requires a CRC field of 16 bits. Finally, the protocol also specifies the types of data transfers that may be used. They include Control, Interrupt, Isochronous and Bulk transactions. Control Transfers are used to configure a device at attach time and can be used for other device-specific purposes, including control of other pipes on the device. Bulk Data Transfers are used for relatively large and bursty quantities of data and have flexible transmission constraints (i.e. saving a large file to a USB hard drive as a background process). Interrupt Data Transfers are used for input devices which only send a small amount of data at a relatively low frequency (i.e. keyboards). Isochronous Data Transfers are used for devices which require data transfers with minimal delays (also called streaming real time transfers). Before the USB, communication with peripheral devices was much more complicated and less reliable. There were several methods that developers could use to communicate with peripheral devices which include the parallel port, serial ports and custom internal cards. One problem with the parallel port and serial ports is that only one device may use them at a time. The serial ports (Comm ports) must also be “declared” as one of 4 Comm ports which could have been used by other devices such a internal 13 modems, the mouse and others. If the developer designed a custom internal card, they would experience some of the same problems. Some additional problems with custom internal cards includes the fact that the user may have already used all of their expansion slots and more technical support is necessary because the devices are harder to install. Each of these devices must also have custom software drivers installed before the device can be used. 14 Chapter 4: Speech Recognition Module The Voice Activated Function Keyboard realizes speech recognition through the benefit of a technology designed by Sensory Inc., the Voice Direct Module. The Voice Direct Module is an integrated circuit designed to provide product developers with an easy-to-use device for adding speech recognition to virtually any consumer product. Developers can configure the module in a manner which best suits the needs of the product it is intended for. The pin diagram for the Voice Direct Module is shown below and the complete associated module pin description table is available in Appendix A. Figure 3: Speech Recognition Pin Diagram The Voice Direct Module itself is available in a kit, which includes all of the necessary external devices necessary to perform speech recognition. Items included in the kit are the microphone, speaker, oscillator, and external memory. Voice Direct’s ease of use and relatively quick setup time made this technology ideal for the VAFK. The Voice Direct Module can be configured in one of two operating modes: external host-controlled or pin-configurable stand-alone. The stand-alone operating mode is designed to provide a complete recognition system using only the Voice Direct Module and the items included in the associated kit previously mentioned. The VAFK 15 employs the Voice Direct Module in the stand-alone mode. In order to operate the Voice Direct Module in this mode one must resistively pull the MODE signal (Module Pin: JP3 -13) to GND. Once in stand-alone mode the functional capabilities of the Voice Direct Module are then determined by the configuration of a number of I/O pins. Configuration settings in the design of the VAFK included the training and recognition sensitivity levels. Both the training and recognition operations have two associated levels of sensitivity, relaxed and strict. The VAFK has Voice Direct configured in the relaxed mode for both training and recognition purposes. This is accomplished by leaving the TRAIN signal (Module Pin: JP2 –11) and the RECOG signal (Module Pin: JP2 – 10) open circuit upon powering up the module. The Voice Direct was configured in relaxed mode to make training easier and reduce the number of recognition errors. Voice Direct is a speaker dependent device, which must be trained by the designated user before recognition can begin. Training the Voice Direct Module is a simple user-friendly process guided by Voice Direct’s pre-programmed speech prompts. To start the training process the TRAIN signal (Module Pin: JP2 -11) must be pulled to GND for at least 100mS. Voice Direct will then prompt the user to say a word or phrase, which must be shorter then 3.2 seconds. Voice Direct will then prompt the user to repeat the word or phrase and then calculate an average of the two speech patterns. The speech pattern’s Voice Direct generates is based on a digital reconstruction of the spoken voice command. During training several errors may occur causing the Voice Direct to terminate the training process. Some common conditions that may cause errors include the speaker not being consistent or clear, there being too much background noise, or a similar word or phrase has already been recorded. If no errors occurred during training 16 the new speech template is added to the existing word set in the 8 Kbyte serial EEPROM. New words or phrases can be added to the set at any time up to a maximum of 15. Once trained the Voice Direct Module is ready to begin speech recognition. The speech recognition process is just as user-friendly as the training process. The speech recognition process uses similar pre-programmed speech prompts in order to guide the user. Recognition begins when the RECOG signal (Module Pin: JP2 - 10) is pulled to GND for at least 100mS. Voice Direct will then prompt the user to say a word or phrase. This in turn produces a new speech pattern which Voice Direct uses to compare with the stored templates and determine which word has been spoken. If no match is found Voice Direct will prompt “Word not recognized” and exit recognition mode. When a trained word is successfully recognized, the associated output pins will pulse high for the duration of one second. These outputs are then used by the next stage of the VAFK’s design that interfaces the Voice Direct Module with the USB circuit. 17 Chapter 5: Voice Direct/USB Circuit Interface The Voice Direct/USB circuit interface enables the Voice Direct Module to communicate with the USB circuit. The interface utilizes handshaking signals between the two in order to translate the outputs of the Voice Direct Module into appropriate USB keyboard scan codes. This was one of the major design areas associated with the VAFK project. Before any interface design could be started, certain specifications needed to be clarified in terms of how the signals between the two circuits would relate to one another. The following block diagram depicts the signals required for communication between the interface and the USB circuit. Interface USB Circuit Key Bus P0.0:7 INT P1.0 Write P1.1 ACK P1.2 Figure 4: Voice Direct/USB Circuit Interface Signals The VAFK will be designed with a system of asynchronous signals, which translate the function code generated by the Voice Direct Module to the USB circuit. The signal protocol established for the interface is as follows: A. IDLE STATE: INT = Low; Write = Low; ACK = Low; B. “Talk Key Pressed”, Generate Interrupt: INT = High; Write = Low; ACK = Low; C. When data is ready, assert Write: INT = High; Write = High; ACK = Low; D. Hold Write High until ACK goes High: INT = High; Write = High; ACK = High; 18 E. De-assert Write and prepare next key: INT = High; Write = Low; ACK = High; F. Wait until ACK goes Low: INT = High; Write = Low; ACK = Low; G. Repeat steps C to F for each Key’s Scan Code: H. Return to IDLE STATE: INT = Low; Write = Low; ACK = Low; Once the specifications for the signal protocol was established we were able to begin designing the Voice Direct/USB Circuit interface. The design of the interface could have been implemented in numerous ways. However, only two main options were seriously considered. The first option considered, entailed designing the entire interface circuit as a finite state machine using logic gates and flip-flops. While this implementation seemed possible we determined that it would be far too complicated and would provide little flexibility to future changes. In addition, the immense complexity associated with this design method would no doubt lead to numerous errors and bugs. For these reasons we chose to implement the interface mostly in microcode, which inturn simplifies the design significantly while offering greater flexibility. With the microcoding implementation, most of the “handshaking” signals and the necessary data that needs to be transferred from one circuit to another can be encoded into ROM’s. Additionally, by using microcode the words and functions that have been defined for this project can be easily changed or adapted to meet future needs. The interface itself consists of three vital parts. The first part decodes the output of the Voice Direct Module and determines which function should be generated. The second part of the interface provides the appropriate keyboard scan codes to the USB controller. The USB controller will then use these scan codes to perform the desired Window’s NT function on the host computer as if they came directly from a traditional keyboard. The final part of the interface deals asserting and de-asserting the interrupt 19 signal to the USB circuit in accordance to the before mentioned protocol. The block diagram of the Voice Direct/USB Circuit interface is depicted below. Figure 5: Block Diagram of Voice Direct/USB Circuit Interface The Voice Direct Module has eight output pins. These outputs need to be decoded in order to determine which function must be generated. The data transferred to the USB controller is dependent upon which instruction we are trying to execute. Our interface will be designed to implement the instructions explained earlier. When, an instruction is recognized, by the Voice Direct Module a certain pattern of bits will be asserted on it’s output pins. Because the output of each recognized word is unique, the outputs of the speech recognition module can be used to directly address an 8-bit ROM. Each unique location in the 8-bit ROM can then contain the unique address of the instructions scan codes located somewhere in the 16-bit ROM. Hence, the 8-bit ROM performs the necessary decoding of the Voice Direct Module outputs. A second 16-bit ROM and a counter is needed to satisfy a specification of USB controller, which is the USB controller can only accept one byte of data at a time. 20 The instructions that are to be implemented are more complex that simply implementing simple keyboard letters and numbers. For example, the instruction “Copy” involves two bytes of data. The first byte of data that is sent to the USB controller is called a modifier byte. This byte of data tells the controller that function keys such as the Control or Alt keys have been pressed. The second byte of data tells the controller what other keys have been pressed along with the function keys. In the case of the “Copy” instruction, the first byte of data sent to the USB controller will be the modifier byte corresponding to the Control key being pressed. The second byte of data will be the keyboard scan code for the “C” button being pressed. These two bytes of data need to be sent to the USB controller sequentially according to the interface handshaking signals. The following table lists the modifier bytes and keyboard codes for each of the commands the VAFK will implement. Windows NT Command Key Equivalent Modifier Byte Key byte Close Alt - F4 00000100 00111101 Task Switch Alt - Tab 00000100 00101011 Copy Ctrl - c 00000001 00000110 Paste Ctrl - v 00000001 00011001 Select All Ctrl - a 00000001 00000100 Start Menu Ctrl - Esc 00000001 00101001 Security Menu Ctrl - Alt - Esc 00000101 00101010 Help F1 00000000 00101010 Figure 6: USB Scan Codes for VAFK commands The modifier byte and scan code for each instruction that will be implemented will be stored in two consecutive ROM addresses in the 16-bit ROM. The location of the modifier byte for each instruction is stored in the 8-bit ROM. When the speech recognition circuit recognizes that “Copy” was said, it asserts the appropriate outputs. These outputs are fed into the 8-bit ROM and the address of the modifier byte of the “Copy” command will be on the ROM outputs. The ROM outputs are then loaded into a 21 counter. When the counter is loaded and enabled, its outputs will assert the address in the 16-bit ROM that corresponds to the “Copy” modifier byte. The “Copy” modifier byte will now be on the outputs of the 16-bit ROM, which are input into the USB controller. The data is now set up and a write signal needs to be sent to the USB controller. This signal will be bit 15 of the 16-bit ROM’s outputs meaning that the write signal will be sent to the USB controller at the same time as the modifier byte. When the USB controller accepts the modifier byte it will send back the Acknowledge signal. This signal will stay high until the write signal is unasserted. The Acknowledge signal will be used to disable the outputs of the 16-bit ROM, which will automatically de-assert the write signal. The USB controller will then be waiting for the next byte of data. The Acknowledge will also be fed into the clock signal of the counter. When the Acknowledge signal is asserted it will cause the counter to increment. When the counter increments, the second byte of data will be on the outputs of the 16-bit ROM. The USB controller will now have both bytes of data needed to complete the “Copy” command. At this point, the Interrupt to the USB controller can be unasserted to tell the USB controller that no further data is coming. The Interrupt signal is the only signal that cannot be easily implemented using microcode. This signal will be generated using a simple sequence detector. This is a simple finite state machine. It will have two inputs, the Recognize signal and the Acknowledge signal. The state diagram for this state machine is as follows. 22 Inputs/Outputs : Rec, Ack / Int 0 0 1,0 0,1 1 0,1 1 0 Figure 7: State Diagram for Interrupt Signal The operation of this state machine is simple. Pushing the Recognize button will trigger the interrupt to the USB controller. The Recognize button is the button that is pressed to activate the speech recognition circuit. The state machine will wait for the Acknowledge signal to be asserted twice. When it is asserted twice, two bytes of data have been sent to the USB controller. The state machine will turn the interrupt off at this point and the USB controller will perform the command that was input into the speech recognition circuit. 23 Chapter 6: USB Circuit Results The design process for the USB circuit has been primarily high level design with extensive research. Research has been done on various USB chips, the USB specification and hardware issues. The high level design includes the USB protocol and firmware. For a more complete description of the USB, see Appendix A: Background of the USB. During the course of this project, we have considered using several USB chips including the Intel 8x931AA, the Cypress CY7C63000A and others. The purpose of a USB chip is to simplify the design process. These chips are actually microcontrollers with a “USB Engine” built on top. The “USB Engine” is designed to abstract away the details of how the communication protocol works. For example, this allows you to send data in parallel to the “USB Engine” and it converts this into the serial packets which can be sent over the bus. Each of the above chips has advantages and disadvantages. Here is a quick overview of their capabilities. Intel 8x931 Cypress CY7C63000A Built in Hub Optional No # of external ports 4 x 8 bits 1 x 8 bits, 1 x 4 bits # of instructions 51 35 External ROM Yes No Internal ROM Optional (Factory Programmed) Yes (User Programmed) Amount of ROM/RAM 64K Bytes/256 Bytes 2K Bytes/128 Bytes # of pins 64 20 Figure 8: USB Chip Comparison 24 The Intel 8x931 chip is a more advanced chip with more features. The extra ROM and RAM will allow designers to create more complex devices. The extra instructions can make it easier to program the firmware and allows for more complex designs. The extra external ports are also useful in creating more complex designs, however two of the ports are required if external ROM is used. The other ports can be programmed to function as a “keyboard scan utility”, external interrupts, timers and a Serial I/O port. Each of these utilities are defined and documented by Intel so that other developers can easily integrate these features into their products. The built in hub will allow designers to add up to 4 additional downstream ports to their device. This could be used on a computer monitor to provide a central point which additional USB devices could easily be attached. The hub is essentially a “repeater” which sends the commands from the host to the downstream devices. The Cypress CY7C63000A chip on the other hand is designed for less complex designs. Less complex designs will not require a hub, they will not need as many external ports, they won’t need a complex instruction set, and they won’t need as much memory. The Cypress chip is ideal for such peripheral devices as keyboards, a mouse, joysticks and others. The Cypress chip has another additional advantage which make it ideal for peripheral devices. Since the ROM is all internal, there will be less external circuitry which will reduce the cost and the complexity. The final difference between these two chips is in the starter kits which are provided. The Intel chip is about $200 while the Cypress chip is only $100. The Intel kit and the Cypress kit both come with all necessary circuitry attached (USB port, memory, clock oscillator, …) and they both come with software necessary for compiling the assembly code. (Note: the Cypress kit also contains two backup USB chips) The Cypress kit on the other hand also comes 25 with some very useful development tools. They have included a complete sample application (Thermometer) which is very useful in understanding some of the implementation details. Finally, the kit also contains an EPROM writer with software that can write the firmware programs into the chip. We have decided to use the Cypress kit because it contains a sample application, it is easier to program and less external hardware required for our application. About 90% of the USB design will be writing firmware routines to control the USB chip. The firmware routines are written in assembly code and compiled into machine code by an assembler. This machine code can be directly downloaded to the chip by using the EPROM writer. The assembly code can be broken down into two major functions, the USB protocol and the keyboard interface. A flowchart for the firmware routines is given below. Enter Main Routine from Device Reset Perform Enumeration by Host Computer Initalize Chip (Registers, Ports, …) Poll for Interrupt Read Keyboard Data Send Data to Host Figure 9: USB Firmware Flowchart The code for the USB protocol will be very similar to the sample application provided in the kit. The USB protocol code is responsible for handling standard device requests, 26 performing enumeration (configuring the chip as a keyboard) and initializing the device. The USB protocol function is called when the host sends a command to the device, causing a function interrupt. The enumeration process of the USB chip has some additional constraints placed on it by the USB specification. Until the enumeration process is completed the device may only draw up to 500A. The host will read the configuration descriptor which will tell it the maximum power required by the device. When the enumeration process is complete then the device may consume up to this maximum power. The device must also respond to the default address (0) until a new unique address is assigned by the host. There are also some keyboard specific functions which are responsible for initializing the speech circuit, determining when an interrupt occurs, reading which key/keys are being pressed and sending the keys to the computer. In order to initialize the speech circuit, the firmware just has to enable power to the speech circuit. Power is initially disabled to the speech circuit because the USB device can only draw up to 500A according to the USB specification. After the device has been enumerated, it may draw an amount which was determined in the enumeration phase. In order to determine when an interrupt has occurred, the “main” function loops continuously, polling the interrupt pin to determine when it is active. When the interrupt is detected, a function will be called to read in the data. After all the data is read in, another function will transmit the data to the host. The Universal Serial Bus technology has been very useful in the design of the VAFK for several reasons. The USB protocol simplified the transmission of data from the keyboard to the computer, it simplified the hardware design and it was more flexible then traditional methods. 27 CONCLUSION Speech recognition is no longer a technology restricted to science fiction. Today, most major corporations recognize the tremendous potential of such technology and are beginning to incorporate it into everyday products. Kim Silverman, manager of spoken language technologies at Apple, said, “Apple’s design philosophy is that people should just be able to take the machine out of the box and command it by voice.” The goal of speech recognition technology is to mimic the fundamental human ability of listening. Thus freeing the hands and eyes and requiring less training to use. With this in mind the VAFK has been designed to take advantage of new cutting edge technologies such as speech recognition and the USB protocol to create a revolutionary product. The VAFK takes advantage of a high performance, speech recognition chip to simplify the design process and to reduce the cost of development. The chip provides high accuracy (99%) recognition, a self contained module with a user- friendly interface. The Voice Direct Module was ideal to interface with the USB bus due to its simple output format. This was critical for our design which was based on providing users with plug-n-play which could easily be realized by the USB protocol. The plug-n-play concept is to make computers as easy to use as possible. The VAFK is an alternative computer interface which makes computers much easier to use by using speech recognition and the USB. Next semester, we plan to take the VAFK from the design stage to the hardware implementation stage. The hardware implementation stage includes programming ROMs, hardwiring circuits, and constructing a case. We also plan on replacing the independent power source of the speech recognition circuit with the USB built in power 28 supply. We will look into enhancing the VAFK by using the Voice Directs external host mode. The external host mode will allow us to increase the number of functions recognized and add multiple users. We may also use a more advanced speech recognition chip such as one with continuous recognition, speaker independent recognition, speaker verification and speech synthesis. 29 References  Cypress Semiconductor Corporation. “CY7C6300A, CY7C6300A, CY7C6300A, CY7C6300A Universal Serial Bus Microcontroller,” San Jose: Cypress Semiconductor Corporation, 1998.  Intel Corporation. “8x931AA, 8x931HA Universal Serial Bus Peripheral Controller User’s Manual,” Mt. Prospect: Intel Corporation, 1997.  Sensory Inc. “Voice Direct Data Book, Interactive Speech” Sensory Inc. 1998.  Sensory Inc. “Voice Direct Speech Recognition Kit” Sensory Inc. 1999.  “Universal Serial Bus Specification, Revision 1.1,” Compaq Computer Corporation, Intel Corporation, Microsoft Corporation, NEC Corporation. 1998.  “USB Device Class Definition for Human Interface Devices (HID), Version 1.1,” USB Implementers’ Forum. 1999.  “USB HID Usage Tables, Version 1.1,” USB Implementers’ Forum. 1999.  William S. Meisel, “Voice Control on PC’s,” Speech Recognition Update, vol. 76, pp 4-5, Oct. 1999.