Machine Learning: Coming to a Communications Service Near You

Machine learning seems like the only topic in the computer science world today. In many cases, it comes with the terms "deep learning," or even better, "artificial intelligence" (AI). When searching for some VC funds that specialize in AI (yes, there are a few), I bumped into VC funds driven by decisions made and supported by AI. Go figure.

 

As part of a recent study we conducted on AI technologies used in real-time communications (RTC), we looked at hundreds of voice and video communications offers -- and discovered that these services are clearly lagging behind on the use of machine learning. I can't really say why that's so, but I can offer a few suggestions:

  1. We're all too focused on the here and now in communications. We sometimes fail to see the bigger picture. To look back across time, we collect data and then try to deduce something from it. But we do this mostly for monitoring purposes, never for the service itself. And without that data collection, there can be no learning.
  2. Communications is predominantly a person-to-person activity. How much machine learning can you add to a live interaction between people anyways?
  3. CPaaS and UCaaS have other headaches to deal with. CPaaS is about offering communication APIs to developers. And UCaaS has its hands full with the new enterprise messaging/team messaging phenomena that Slack is leading. Who has time for machine learning?

But change is coming. We see it in demos, announcements, proofs of concept (POC), and, at times, in actual products. The change is pushed forward by better tooling and technologies that are maturing in the machine learning domain.

General Machine Learning or AI in RTC?
A communications vendor can adopt machine learning in one of two ways:

  1. Use general machine learning
  2. Incorporate machine learning into the communications itself

With general machine learning, a communications vendor acts like any other market player. Using its business and meta data, it tries to derive operational or commercial insights. The vendor mostly ignores the actual voice, video, or data packets streaming through its networks while it serves its customers.

When you consider incorporating machine learning and AI into real-time communications, capabilities such as speech-to-text and computer vision immediately come to mind -- fields deriving value directly from machine learning. The results, though, are different in nature than what you'd gain by taking the general machine learning route. There is a lot to be gained by looking at the actual interactions made and not just skimming through its associated meta data.

Market Dynamics
There are three players in this game: communications incumbents, cloud vendors, and startups.

Communications Incumbents
The incumbents seem slow to embrace machine learning fully. They're writing about it in their blogs and running POC tests, and they may even have a feature or two they can paint in machine learning or AI colors. However, actual development and adoption is lagging.

Most are in planning or experimentation mode. Some have actual products with machine learning capabilities in them, such as transcription, usually achieved through integration with third-party specialized vendors and cloud providers.

In all cases, machine learning capabilities within communications products are minor features or shiny add-ons. Things like voicemail transcription or automatic zoom in video conferencing cameras. They aren't at the core of the communications services.

Cloud Vendors
Amazon. Google. Microsoft. IBM. With Amazon Web Services, Google Cloud Platform, Azure, and IBM Cloud, these vendors control most of the cloud infrastructure market. They have all committed to machine learning in all of its forms, offering the tools and pipelines to ease the development effort and expertise required by those who use them. And they're now setting their sights squarely in the communications space.

They all offer speech-to-text, text-to-speech, and computer vision services in the form of APIs that developers can use for integration.

The threat comes from the fact that most of these cloud vendors are offering communications services as well.

Microsoft has Teams. Google has Hangouts, Meet, Allo, Duo, Voice and now Duplex and the Conversation API. Amazon has Chime and Connect. They're all setting their sights in the communications space, ready to disrupt the incumbents.

Startups
Then there are the startups -- new companies built from the get-go around the notion of machine learning and with a focus on communications.

These vendors have no legacy to deal with, and some of them are built around machine learning directly. Some will offer point solutions for a technology, covering speech to text, or the ability to recognize faces. Others will cover an actual use case -- for example, workforce management in contact centers by way of machine learning is popular these days.

As startups go, they'll use the latest technologies available, try to move faster than the incumbents, and offer better flexibility and customization than the cloud vendors in the space. A notable vendor here is TalkIQ, a provider of real-time transcription services acquired by Dialpad.

Where Do We Find AI in Communications?
I've partnered with RTC technologist and consultant Chad Hart to answer this specific question: Where do we see machine learning and AI playing a role in real- time communications? The result is our report, "Artificial Intelligence in Real Time Communications."

In the coming weeks, we'll share aspects and learnings from our research, covering topics around speech analytics, voice bots, computer vision, and quality optimization. If you want to learn more, contact us at [email protected].