Font
Large
Medium
Small
Night
Prev Index    Favorite Next

Voice applications are about to explode

Voice application is about to explode

2011/11/23

For many years, voice recognition technology has always been stuttering and has no choice but to speak. Now, Siri has emerged, pushing this technology to the mainstream, and also bringing a very wide application prospect.

Voice recognition is nothing new. For many years, consumer electronics, cars and automatic call centers have been "listening" to users' instructions. Since 2009, Google has been collecting voicemail messages. Three years before that, Microsoft has also put similar technology into Windows Vista. So, what is the magic of Apple's new virtual personal assistant called Siri?

It can understand your heart.

In other words, siri is not just a voice recognition technology, it can understand language - it is this point that begins to change the way users interact with their mobile phones. Now, many people predict that siri will play a major role in promoting this long-awaited technology, just as Apple’s iPhone’s touch system has put touch technology into the mainstream. This technology will clear many obstacles on the development of innovative applications. Market research company opusresearch said that the output value of the voice recognition industry will reach about $2.7 billion this year. The company also predicts that the market will set off a post-siri voice application boom in 2012.

What makes Siri so unique? Tim Bajari, president of creativestrategies, a strategic consulting firm, said the answer lies in accuracy. He said: "Siri is launching a truly new generation of human-computer interfaces, which have had a significant impact on the market for voice understanding and precise grasp of voice."

Siri certainly can’t be perfect. This technology is difficult to understand certain accents, but Apple is already working to solve these minor problems. But for a software, Siri’s performance is remarkable. Siri’s founder was SriInternational, a research lab in Monroe Park, California. According to it, the key to Siri is natural language processing technology. Siri works by capturing voice signals and converting them directly into text, which is exactly the same as the text users see on their phone screens. Siri then matches these statements with certain pre-made instructions, such as “making calls” or “editing text messages.”

This technology has great potential and is by no means suitable for tablets and smartphones. Voice recognition system Nuance is the developer of the voice recognition software Dragon, which has been used in the healthcare industry for ten years. Nuance's latest software is running on the desktop of the physician, which uses a clip-on microphone to record sounds. As the consultation progresses, the software will promptly update the patient's electronic health records. Joe Petro, senior vice president of R&D at Nuance's healthcare department, said: "Patients may talk about their mother's medical history at this second, and the next second, they will mention their father's medical records. This software can understand these situations."

How does it do it? It works very similarly to Siri: by taking meaning from the vocabulary it recognizes, then referring to a medical information database, comparing it with the patient's medical history. It then uses statistical inference methods to make connections between the pieces of information it finds, and even gives advice for symptomatic treatment. About 450,000 physicians across the United States are using Nuance's software. Petro said the technology has an accuracy of more than 90% and will continue to improve over time. Obviously, the software has a good profit prospect, so Nuance decided to raise its fourth-quarter revenue forecast by about $10 million.

However, researchers have greater hope for the future of the technology. Skep Rizzo is assistant director of the University of Southern California's Institutional for Creative Technology. He is developing an interactive simulation technology to help veterans target post-traumatic stress

Disorder) seeks medical consultation services. The software is called simcoach, and its ultimate goal is to try to understand the emotional state behind people's spoken language. Rizzo said: "This is a huge challenge. Because voice patterns must be collected and then they must be analyzed like the human brain." Rizzo said that humans may be able to detect abnormal emotions of their friends or family because people's speech speed tends to slow down and have less stress, but it is quite difficult for computers to capture these signals.

But some research in this field can achieve faster results without having to wait. Last spring, Rizzo's research partner, MIT professor Alex Penterland, conducted a similar trial of voice inference technology at the call center at Bankofamerica, to analyze the impact of employee communication on business success. Penterland lets employees wear small electronic devices around their necks for six weeks, which can record employees' actual locations, body language and voice. The recorded data shows who the employees are communicating with, how far they are when they stand, and how intonation they talk. "We found that the most efficient employees not only talk to a large number of people, but also talk to colleagues who also show this trait." As a result, he said, he just needs to adjust the coffee breaks to make the pace between these employees more synchronized, and the call center saves $15 million a year.
Chapter completed!
Prev Index    Favorite Next