Word of Mouth: Are AI & Voice Control Set to Rock the Enterprise?

Consumers are already experiencing the benefits of being able to use virtual digital assistants like Apple's Siri, Microsoft's Cortana, and Google Assistant to create grocery lists, check the weather, and control appliances by voice command. Clearly we are moving toward a voice-controlled relationship with technology, and before long we'll see the enterprise environment leveraging artificial intelligence (AI) for personal assistance, as well as for control and authentication.

No More Lost Remotes -- or Time

Think how much easier making notes verbally during an examination would be for a physician, and then sending off for a prescription quickly by just saying, "System, issue pharmacy order of X." Or, consider your work life. Instead of spending the first 15 minutes of a meeting searching for the projector remote, wouldn't being able to issue a simple voice request, "System, turn on the projector and the TV, and dim the lights," be great?

How close are we to voice-first business? In "The 2017 Voice Report," voice analytics firm VoiceLabs discussed the various layers needed to support a voice-first approach in the consumer world: hardware, AI software, voice applications, and the voice application ecosystem. However, moving to a voice-first enterprise environment from the much simpler consumer model is not an easy task.

Security will be critical if we have enterprise systems relying on voice commands -- you don't want an untested intern to have the ability to command the most critical equipment and make a costly or dangerous mistake. Privacy is a top concern too, and while a physician ordering a prescription by voice seems simple enough, we need to think about this in context of regulations. Are a patient's rights -- as per HIPAA regulations -- violated if a patient's medical information is overheard by third parties?

Is it Secret? Is it Safe?

Banks have been among the first companies to introduce voice authentication systems by integrating them into their telephone banking systems. But I suspect customer security concerns will be like those experienced with the adoption of credit cards and the fear of fraud from online shopping. If the adoption cycle is the same, the initial concerns will need to be overcome before we see the meteoric rise of voice authentication.

Investment in voice recognition innovation will continue, reaching a level that will enable voice security to be viable in an enterprise environment and ensure that only authorized users with the right privileges can perform the associated actions.

Whereas your microwave might not be spying on you, some devices will be always on, always listening... and potentially recording. With increasing awareness around this, more users are likely to turn on the microphone muting option, so they get the benefits of constant monitoring without risking the downsides. Products need secure software access to detect and prevent hacking efforts.

Adding Context

The first use cases are primarily around voice response systems -- think contact centers, cars, and smartphones. But as many of us know from firsthand experience, voice response often works marginally at best. We need to refine recognition and contextualization technologies before we can realistically think about enterprise-wide adoption. Initiatives such as Mozilla's Project Common Voice aim to enhance language recognition capabilities, and Microsoft has recently announced its conversational speech recognition system has reached its lowest error rate at 5.1% -- putting it on par with the accuracy of human transcribers.

But needed improvement isn't just about word recognition, it's about what to do with those words. Here's where cognitive engines and AI come into play. Some of the biggest players in the industry -- Microsoft, for example, with its open source cognitive recognition engine -- can be leveraged to understand the context of the words. "How do I get to the mall?" may sound simple enough, but the request needs context. Location awareness could indicate that you're standing outside the White House, and by your question you most likely mean the National Mall just a few streets over and not the nearest shopping center.

Not Just a Single Layer

The real challenge for enterprise technology comes from what's behind the voice recognition systems -- both from the integration of Internet of Things devices to the system itself. Here, we need to further leverage those cognitive engines as check-and-validation systems. Think of someone accidentally giving a command to "Turn off cooling system to reactor 4" instead of reactor 3, which has already been shut down, or a doctor using the system to prescribe the wrong dose through a simple slip of the tongue.

These might be extreme examples, but there will need to be a holistic view of the actions being automated to prevent human error and bring in broader intelligence to understand the actions related to voice-controlled requests. Maybe "Turn off cooling system to reactor 4" was correct, but the system would then need to understand the set of operational procedures to implement those actions.

A Truly Integrated Solution

Tying in strategically with the development of true voice-controlled enterprise environments are the innovations happening in the traditional voice communication world. We are witnessing the rise of communication platform as a service (CPaaS), which leverages APIs to transform today's applications with the integration of voice functionality. Major voice communication vendors are entering this market, providing CPaaS infrastructures with a standardized set of APIs to enable companies to integrate communications into their business processes.

Usually we look at integration as incorporating voice and video services into existing applications that enable people to connect -- think of a property rental application that allows you to move from an online application to a voice call with the realtors or landlord. But I believe these will play a big part in that "voice-first" environment by leveraging the rich API infrastructure of CPaaS to communicate with applications and things.

This Is Only the Beginning

For rapid development of voice technology, the way in which CPaaS and other platforms communicate with devices needs to be standardized. Each of today's consumer-based voice-controlled systems have their own interfaces, but the good news is there are a set of technologies in the works to help minimize potential obsolescence. Frameworks like the Linux Foundation's IoTivity, under development, aim to bring about a standardized platform.

We are already seeing the value, benefits, and rapid expansion of new voice applications for consumers. As voice-first technology continues to advance, that voice interaction systems will make it into the enterprise environment too is all but a given.