No Jitter is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

3 Reasons Speech Is Becoming Preferred for Self-Service: Page 2 of 2

Improvements in Quality

The second reason natural language is seeing rapid adoption is that, well, it’s just a lot better. Not a little bit better but dramatically better.

Cloud-based speech services now support open grammars, which means that literally anything said can be translated into text. Previously, voice user interfaces used closed grammars, which meant developers had to predict what a caller might say and then build a set of domain-specific grammars to match variations of requests. For example, a stock trading application would leverage a closed grammar that included a list of ticker symbols and company names. In most cases today, building closed grammars is no longer necessary because the speech servers have been trained to transcribe literally anything a caller might say and cloud-based NLP systems like Google’s Dialogflow can then determine caller intent.

Closed grammars might still be needed in some cases, such as a phone directory application that might need a list of people’s names for better performance. However, Google Cloud and IBM Watson are providing solutions to address this problem. Google uses phrase hints, which is a list of phrases that act as "hints" to boost the probability that words or phrases will be recognized. IBM Watson uses custom models for the same purpose.

Another example of innovation is Google’s Tacotron, a new AI-generated text-to-speech (TTS) system that is almost indistinguishable from a human voice. It uses two neural networks for TTS. The first translates text into a visual spectrogram and the second, WaveNet, reads the spectrogram and creates the final speech. Listen to these audio samples to see if you can tell which is the TTS engine and which is the recorded human voice. I bet you can’t.

Lastly, the number of languages and the variety of voices to choose from has exploded. Businesses can now choose from hundreds of languages… how about Sinhala or Sri Lankan Tamil dialects? And they’re not limited to a handful of voices (please select male or female) but rather you can select from a huge variety.

Advances in Application Development

Recent advances in speech and natural language have been powerful drivers of adoption. But they don’t tell the full story. In order for a business to make use of these technologies it needs an easy way to develop self-service applications that harness the underlying power of the latest AI and speech advances. They also need a way to have those applications deployed within their carrier’s network. Without these tools, businesses are left with little more than a complicated web of APIs forcing them to rely on a team of developers to get their self-service apps (eventually) to market.

One way to do that is through a development platform for intelligent virtual agents. The platform should enable telecommunications carriers to build, package, and deploy natural language applications -- and provide businesses of all sizes the ability to manage and customize virtual agent functionality for their needs.

Like human agents, these virtual agents should have a wide variety of skills, including speech recognition, NLP, TTS, voice biometrics, transcription, and API integration.

And they should be able to perform a variety of tasks. Using a simple drag-and-drop interface, non-technical users should be able to build or select from a set of pre-built tasks, that may include things like biometric enrollment, order lookup, PCI-complaint payment, or queue callback.

Within applications users should be able to select which speech recognition, natural language, TTS, and biometric services they want to use from a variety of vendors like Google Cloud and IBM Watson. There should be no need to license and manage the services from each vendor. And they should be able to switch anytime they choose.


New advancements have improved the quality of speech recognition and NLP while reducing cost and complexity. Those advancements along with a new breed of application development and deployment tools are removing historical barriers to adoption for businesses of all sizes, enabling them to adopt self-service solutions, made simpler and more effective through the use of natural language interfaces.


Richard, great write up on NLP and where the solution can take an enterprise. I take away that there is an economic driver in all this and a serious upgrade in how virtual agents can respond properly to creat better CX.