Voice user interface
A voice-user interface enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.
Voice user interfaces have been added to automobiles, home automation systems, computer operating systems, home appliances like washing machines and microwave ovens, and television remote controls. They are the primary way of interacting with virtual assistants on smartphones and smart speakers. Older automated attendants and interactive voice response systems can respond to the pressing of keypad buttons via DTMF tones, but those with a full voice user interface allow callers to speak requests and responses without having to press any buttons.
Newer voice command devices are speaker-independent, so they can respond to multiple voices, regardless of accent or dialectal influences. They are also capable of responding to several commands at once, separating vocal messages, and providing appropriate feedback, accurately imitating a natural conversation.
Overview
A VUI is the interface to any speech application. Only a short time ago, controlling a machine by simply talking to it was only possible in science fiction. Until recently, this area was considered to be artificial intelligence. However, advances in technologies like text-to-speech, speech-to-text, natural language processing, and cloud services contributed to the mass adoption of these types of interfaces. VUIs have become more commonplace, and people are taking advantage of the value that these hands-free, eyes-free interfaces provide in many situations.VUIs need to respond to input reliably, or they will be rejected and often ridiculed by their users. Designing a good VUI requires interdisciplinary talents of computer science, linguistics and human factors psychology – all of which are skills that are expensive and hard to come by. Even with advanced development tools, constructing an effective VUI requires an in-depth understanding of both the tasks to be performed, as well as the target audience that will use the final system. The closer the VUI matches the user's mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction.
A VUI designed for the general public should emphasize ease of use and provide a lot of help and guidance for first-time callers. In contrast, a VUI designed for a small group of power users, should focus more on productivity and less on help and guidance. Such applications should streamline the call flows, minimize prompts, eliminate unnecessary iterations and allow elaborate "mixed initiative dialogs", which enable callers to enter several pieces of information in a single utterance and in any order or combination. In short, speech applications have to be carefully crafted for the specific business process that is being automated.
Not all business processes render themselves equally well for speech automation. In general, the more complex the inquiries and transactions are, the more challenging they will be to automate, and the more likely they will be to fail with the general public. In some scenarios, automation is simply not applicable, so live agent assistance is the only option. A legal advice hotline, for example, would be very difficult to automate. On the flip side, speech is perfect for handling quick and routine transactions, like changing the status of a work order, completing a time or expense entry, or transferring funds between accounts.
History
Early applications for VUI included voice-activated dialing of phones, either directly or through a headset or vehicle audio system.In 2007, a CNN business article reported that voice command was over a billion dollar industry and that companies like Google and Apple were trying to create speech recognition features. In the years since the article was published, the world has witnessed a variety of voice command devices. Additionally, Google has created a speech recognition engine called Pico TTS and Apple released Siri. Voice command devices are becoming more widely available, and innovative ways for using the human voice are always being created. For example, Business Week suggests that the future remote controller is going to be the human voice. Currently Xbox Live allows such features and Jobs hinted at such a feature on the new Apple TV.
Voice command software products on computing devices
Both Apple Mac and Windows PC provide built in speech recognition features for their latest operating systems.Microsoft Windows
Two Microsoft operating systems, Windows 7 and Windows Vista, provide speech recognition capabilities. Microsoft integrated voice commands into their operating systems to provide a mechanism for people who want to limit their use of the mouse and keyboard, but still want to maintain or increase their overall productivity.Windows Vista
With Windows Vista voice control, a user may dictate documents and emails in mainstream applications, start and switch between applications, control the operating system, format documents, save documents, edit files, efficiently correct errors, and fill out forms on the Web. The speech recognition software learns automatically every time a user uses it, and speech recognition is available in English, English, German, French, Spanish, Japanese, Chinese, and Chinese. In addition, the software comes with an interactive tutorial, which can be used to train both the user and the speech recognition engine.Windows 7
In addition to all the features provided in Windows Vista, Windows 7 provides a wizard for setting up the microphone and a tutorial on how to use the feature.Mac OS X
All Mac OS X computers come pre-installed with the speech recognition software. The software is user-independent, and it allows for a user to, "navigate menus and enter keyboard shortcuts; speak checkbox names, radio button names, list items, and button names; and open, close, control, and switch among applications." However, the Apple website recommends a user buy a commercial product called Dictate.Commercial products
If a user is not satisfied with the built in speech recognition software or a user does not have a built speech recognition software for their OS, then a user may experiment with a commercial product such as Braina Pro or DragonNaturallySpeaking for Windows PCs,and Dictate, the name of the same software for Mac OS.
Voice command mobile devices
Any mobile device running Android OS, Microsoft Windows Phone, iOS 9 or later, or Blackberry OS provides voice command capabilities. In addition to the built-in speech recognition software for each mobile phone's operating system, a user may download third party voice command applications from each operating system's application store: Apple App store, Google Play, Windows Phone Marketplace, or BlackBerry App World.Android OS
Google has developed an open source operating system called Android, which allows a user to perform voice commands such as: send text messages, listen to music, get directions, call businesses, call contacts, send email, view a map, go to websites, write a note, and search Google.The speech recognition software is available for all devices since Android 2.2 "Froyo", but the settings must be set to English. Google allows for the user to change the language, and the user is prompted when he or she first uses the speech recognition feature if he or she would like their voice data to be attached to their Google account. If a user decides to opt into this service, it allows Google to train the software to the user's voice.
Google introduced the Google Assistant with Android 7.0 "Nougat". It is much more advanced than the older version.
Amazon.com has the Echo that uses Amazon's custom version of Android to provide a voice interface.
Microsoft Windows
is Microsoft's mobile device's operating system. On Windows Phone 7.5, the speech app is user independent and can be used to: call someone from your contact list, call any phone number, redial the last number, send a text message, call your voice mail, open an application, read appointments, query phone status, and search the web.In addition, speech can also be used during a phone call, and the following actions are possible during a phone call: press a number, turn the speaker phone on, or call someone, which puts the current call on hold.
Windows 10 introduces Cortana, a voice control system that replaces the formerly used voice control on Windows phones.
iOS
Apple added Voice Control to its family of iOS devices as a new feature of iPhone OS 3. The iPhone 4S, iPad 3, iPad Mini 1G, iPad Air, iPad Pro 1G, iPod Touch 5G and later, all come with a more advanced voice assistant called Siri. Voice Control can still be enabled through the Settings menu of newer devices. Siri is a user independent built-in speech recognition feature that allows a user to issue voice commands. With the assistance of Siri a user may issue commands like, send a text message, check the weather, set a reminder, find information, schedule meetings, send an email, find a contact, set an alarm, get directions, track your stocks, set a timer, and ask for examples of sample voice command queries. In addition, Siri works with Bluetooth and wired headphones.Apple introduced Personal Voice as an accessibility feature in iOS 17, launched on September 18, 2023. This feature allows users to create a personalized, machine learning-generated version of their voice for use in text-to-speech applications. Designed particularly for individuals with speech impairments, Personal Voice helps preserve the unique sound of a user's voice. It enhances Siri and other accessibility tools by providing a more personalized and inclusive user experience. Personal Voice reflects Apple's ongoing commitment to accessibility and innovation.