Computer Vision: What We See for Business Video Calling

If you look at the use of computer vision in video calling, you'll see a distinct difference between consumer and business examples. With consumers, computer vision is about additions to the video, whereas in the enterprise it's about subtracting from the video. More on that in a second... I am getting ahead of myself on this one.

 

Back to machine learning and real time.

 

What do you do with machine learning and artificial intelligence (AI) in real-time video calls? That was a big question for us when we set out to analyze AI in real-time communications recently. Last week, I described the various vendor archetypes we found on the market. This time, I want to take a peek at the computer vision component of our research.

Tsahi_Messenger.png

Tsahi Levent-Levi wearing a messaging filter

Me on Messenger... I didn't fancy any of the silly hats

Here's the challenge: They say an image is worth 1,000 words, and a video quite a bit more. There's a lot of data to process to do anything substantial related to computer vision on video. And doing that on real-time video is even harder. This is probably why most of the use cases we see today around machine learning in real-time communications is leaning toward voice and text-to-speech scenarios.

 

And still. There are things you can do with video. Things we've identified in our research. One thing we immediately identified is the machine learning-based filters you have today in messaging services. Snapchat, Instagram, and Facebook Messenger are leading examples -- you put silly hats on people, add stuff on the video once you find the location of the person's face, and you're done.

As fun as this is, it isn't something you'll likely see when trying to work with a business partner in an important video conference.

What would work then?

Many things. One of them would be the opposite of adding and enriching an image with virtual objects. It would be subtracting content, like filtering the background. This is exactly what Microsoft has introduced to Microsoft Teams: the ability to blur the background. And what better example to show it on than the poor BBC interviewee from last year?

 


null

null

That can be quite useful for conferences. We've identified nine different use cases where computer vision can be of use to real-time communications. Silly hats and image enhancements are two of them.

 

What was interesting in all this is that we are still very early on in where computer vision finds its place. Different vendors have invested in different areas: Microsoft went for background blurring while Facebook went for silly hats. Some use cases will get to market faster than others.

 

The endgame? Maybe getting AI to synthesize most of the video out of thin air out of the characteristics of what needs to be seen. There are things you can do today with computer vision and real-time communications. It doesn't need to be super complicated and computationally intensive to bring value. The idea is to identify the benefit of a certain algorithm/capability and then apply it in the right context.

 

Looking to learn more about computer vision and its place in real-time communications? Interested in how machine learning fits into your strategy? Check out our report on AI in real-time communications.

 

For the first piece in this ongoing series, read "Machine Learning: Coming to a Communications Service Near You," and stay tuned for more on topics around speech analytics, voice bots, computer vision, and quality optimization. If you want to learn more, contact us at [email protected].