The topic of conversational AI is certainly a hot one, and rightfully so. Customer experience (CX) reigns supreme when it comes to business competition today but ensuring a high-quality CX isn’t as easy as it used to be. The pandemic accelerated digital transformation plans. My research shows that 74% of companies accelerated digital initiatives by at least one year, about a third of those by two years or more.
Today, we live in a world where digital interactions are the norm. I saw a recent McKinsey study
that stated 58% of all customer interactions are digital, and 55% of all products and services have been digitized. The rapid digitization of businesses has created so much interest in conversational AI. However, the industry definition of conversational AI is extremely narrow, and it revolves around using the technology to enable chatbots to interact with customers. Obviously, people use more than chatbots to communicate with brands.
Evidence of this comes from the recently released Gartner Magic Quadrant for Enterprise Conversational AI platforms. While I don’t have access to the entire report, I do know the first line states: “Enterprise conversational AI platforms automate multiple chatbot use cases within the enterprise, creating bots that are orchestrated and operationalized across multiple business units.” If that’s not a narrow definition, I don’t know what is.
Chatbots are important and their use had been growing, but conversational AI should also include anything that revolves around conversations, specifically:
- Data AI: Analyzing data is critical to finding insights in conversations. This is different from understanding conversations but would include reporting, analytics, application integration, security, and fraud detection. Most vendors involved in chatbots are also developing some form of data AI.
- Voice AI: This would be real-time technology that can understand what people are saying and use that information to provide an automated response or provide an agent with information critical to satisfying a customer. Many vendors are doing this today, including Nvidia, Google, and others, and it is an exploding field. One of the better vendors in this area is Otter.ai, which has one of the best transcription engines I have used. It’s the technology currently used in Zoom meetings.
- Video AI: This is the ability to use an AI to dissect a video and use that information to improve customer and employee interactions. Video AI is widely used in some industries, such as self-driving cars, where objects on the road can quickly be identified. Cisco has loaded Webex with video AI capabilities to identify people. A contact center use case would be to have a customer show an agent a video for service purposes. Imagine a customer showing an agent a complex object like a home appliance, and the agent easily directing the customer on how to replace it.
- Emotion detection: One could look at this as video AI, but I believe it’s a specialized field within video and worthy of calling out. While some vendors promote their solution as having emotion detection for voice calls based on word choices, technology is being developed for AI to better understand people’s reactions based on body language and facial expressions. In addition to visual cues, the more advanced systems will be trained to understand tonal shifts in conversations by factoring in changes to pace, pitch, word choice, and other contextual sources (calling to fix a problem or calling to get more of a good thing). This capability could be used in several industries, including medical, education, legal, retail, and more. Imagine a use case where a doctor is explaining something complex to a patient over video. The AI engine could detect confusion or lack of attention from the patient and then send a pop-up on the doctor’s side, saying that the patient is not understanding. Another use case for emotion detection AI: A teacher can be notified when a majority of students are not paying attention or if they are disengaged.
About a year ago, JC2 (John Chambers) Ventures backed Uniphore, acquired Emotion Research Labs
to move into the video AI space. Several small companies operate in this space, such as visio.ai, but Uniphore is the only “mainstream” conversational AI company to invest heavily in emotion detection as part of their platform approach. I know Nvidia is also doing work in this area, and I expect it will be added to its Maxine product
The evolution of conversational AI is something I discussed in the Uniphore post linked above. In the post, I referenced a conversation I had with Chambers where his vision for conversational AI is that it would become a platform for all forms of AI, including voice, video, and data. At the time, I had included emotion detection in video, but I believe it's a separate discipline.
One final observation on the latest Gartner MQ on conversational AI is how many vendors are listed in the report and where they are positioned. Currently, 21 companies are listed, and the largest group of vendors is in the bottom left (niche players), which shows a fragmented market. While a couple of larger vendors are in this MQ (Google, AWS, and IBM), I expect over time that we will see others jump into it, like Microsoft, Cisco, and Salesforce. The good news for buyers is there will be no shortage of vendor options. The bad news is the market is likely to stay fragmented until some consolidation occurs.
So, if Gartner had gone and included a broader scope for conversational AI solutions, the sampling of vendors would be considerably smaller, maybe even just Uniphore, Microsoft, IBM, Nvidia, and perhaps Cisco. However, limiting conversational AI to chatbots does nothing to advance the technology and drive home the concept of a platform. The demand for conversational AI continues to grow, and the category needs to adapt accordingly.
If the industry adopts a broader definition of conversational AI, the question becomes: How will competing solutions be evaluated? Here’s a starting point: As the name suggests, to be a leader in conversational AI, you need to be good at two things — conversations and AI. And to be clear, not just the use of voice technology that is typically limited to simple command-action or question-answer interactions (think Siri and Alexa). These systems are not the same as those built around the complexities, nuances, and back and forth of conversations. Additionally, they should be able to derive meaning or even summarize and recall previous stories or react in real-time to the conversation based on any emotion they hear in your voice, etc. Relying on powerful AI and automation, conversational capabilities can then direct action, trigger, direct action, trigger outcomes, follow up, and so forth.
Suffice it to say, the long-term winners in this next wave will be those who take a holistic view of the problems they are solving, focus on conversations of all kinds: voice, text, and video and have the AI/machine-learning chops to really maximize the value of every one of those conversations.
Get ready — it’s going to be a wild ride.