Conversational Analytics / 25.07.2017

Why Scaling is a Challenge in Speech Recognition without AI and ML

Speech Recognition is today one of the most widely used technologies in multiple offerings. From conversational chatbots and Speech Analytics software to CX interface, the applications of Speech Recognition technology are multifold. The rise in the popularity of Speech Recognition can be pinned down to the proactive use of Artificial Intelligence backed by Machine Learning and Data Learning capabilities.

While work in the field of Speech Recognition has been going on for decades, the success rates have witnessed a surge only with the implementation of AI and ML functionalities. For example, Speech Recognition was one of the first areas of research for software giant Microsoft in the early 1990s. Speech Recognition solutions at that time had only limited commercial applications.

Microsoft introduced Speech Recognition technology alongside its popular OS Windows 95 but the error rate was close to a 100%. Compare that to an accuracy rate of over 90% with Cortana, the company’s latest phone assistant tool shows how AI and ML functionalities have been game changers. Since scalability has a direct relationship with accuracy and usability, traditional Speech Recognition tools found it hard to scale.

Scaling Speech Recognition Tech

Scalability challenges minus AI and ML

The goal of Speech Recognition tools has always been to attain a level of accuracy on par with human capabilities. In human to human interaction, missing out two words out of every 20 is the average. While it may not be a problem for the humans, Speech Recognition tools have been unable to bridge this barrier to accuracy.

For instance, humans can understand despite loud background noises or distortion in a communication line or even variation in accents. Unless the Speech Recognition software is built analytically on Machine Learning functions, it may find it hard to decode such voice modulation or accent changes, derailing the speech recognition system’s functionality. Unsurprisingly, lower accuracy rates have led to lower scalability for Speech Recognition tools in the past.

Linking accuracy and scalability

The more accurate the Speech Recognition tool, the higher it’s scalability. One of the best examples is the case of Wit.ai, an 18 month-old Palo Alto based startup that was acquired by Facebook in 2015 owing to the high accuracy rates of its Speech Recognition tool. At 95% accuracy, Siri not only outpaces voice assistants but also the most widely used personal voice assistant in the world. Baidu, China’s answer to Google has scaled enormously riding on its high accuracy levels, which, at 96% are even better than many humans when it comes to identifying spoken words in both English and Mandarin.

Conclusion: Speech Recognition tools have become scalable with AI and ML functionalities offering higher accuracy and better analytical insights.

To further understand the importance of Artificial Intelligence and Machine Learning in Speech Recognition tech and its future applications, read the Whitepaper The Relevance of Artificial Intelligence and Machine Learning in Speech Recognition