Voice recognition, or speech recognition, is a computer technology that utilizes audio input for entering data rather than a keyboard. Speaking into a microphone, for example, produces the same result as typing words manually with a keyboard. Simply stated, voice recognition software is designed with an internal database of recognizable words or phrases. The program matches the audio signature of speech with corresponding entries in the database.
Though turning speech into text might sound easy, it is an extremely difficult task. The problem lies in the virtually infinite array of individual speech patterns and accents, compounded by the natural human tendency to run words together.
Various models of speech recognition software are used for an array of applications, from personal dictation to commercial automated call routing, from aiding the disabled to sports and news event subtitling. Each model behaves differently and has its own capabilities and boundaries.
Voice recognition programs that require the user to "train" the software to recognize their particular stylized patterns of speech are called speaker dependent systems. Individuals commonly use these types of programs at home or at the office. Email, memos, letters, data and text can be input by speaking into a microphone.
Some voice recognition systems, called discrete speech systems, require the user to speak clearly and slowly and to separate words. Continuous speech systems are designed to understand a more natural mode of speaking.
Discrete speech systems are widely used for customer service routing. The system is speaker independent, but understands only a small pool of words or phrases. The caller is given a choice to answer a question, usually with "yes" or "no." After receiving an answer, the system escalates the caller to the next level. If the caller replies with a unique answer, the automated response is usually, "Sorry, I didn't understand you; please try again," with a repeat of the question and available answers. This type of voice recognition is also referred to as grammar constrained recognition.
Continuous speech is a more sophisticated form of voice recognition software, wherein the caller can speak naturally to explain a problem or request a service. This program is designed to pick out key words or phrases and make a statistical best-guess as to what the customer wants. Speaking plainly aids the program in identifying the need. This type of system has a far more intensive database than discreet speech systems and is also referred to as natural language recognition.
Automatic Speech Recognition (ASR) is a model of voice recognition designed for dictation. This software differs from previous models in that it does not strive to understand what is being said, only to identify the words spoken. Since many words in the English language sound alike, mistakes are easily made. ASR software is often found on digital voice recorders.