No Jitter is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Cloud Video AI: Beyond the Basics & Into the Future


A button that says AI
Image: photon_photo -
From removing the background sound of dogs barking to blurring out that in-progress home office renovation, AI-based cloud video features are as handy as they are revolutionary. But what if AI could play an even more active role, like suggesting a meeting before you even think to schedule it or providing you with crucial related content without being prompted? These might seem like sci-fi wishful thinking, but as AI advances, all and more is possible.
AI is a key factor driving the cloud video service market forward, and the leading providers all have rich AI-based feature sets from in-house development, acquisitions, or integrations. Meantime, some cloud video providers are turning to an alternate source for their AI video capabilities: a cloud-native suite of graphical processor unit-accelerated AI features provided by GPU vendor Nvidia. With Nvidia’s service, called Maxine, providers can bring features like face alignment, bandwidth efficiency, and custom avatars to their video services, as UC analyst Zeus Kerravala shared in a No Jitter article.
Regardless of how they’re incorporating AI, much innovation is yet to come, as Nema Prachi, senior analyst, enterprise communications, wrote in her recent Omdia Universe report, “Collaborative Meeting Services, 2021.” Expect to see "a great deal of innovation from simplifying the before, during, and after meeting user experience," with CIOs increasingly looking for AI-based features as they align with their need to reduce costs and enhance productivity, Nema wrote.
It's in this future innovation where AI will show its full worth, two industry analysts agreed.
What Is AI? It’s Not Just Automation
“Over the last decade, videoconferencing has evolved from a technological curiosity into an extremely reliable core business tool,” said Ira M. Weinstein, founder and managing partner at Recon Research. And with the pandemic accelerating the use of video meetings, cloud video is now virtually synonymous with enterprise communications and collaboration.
To address growing demand during the pandemic, cloud video providers have increased capacity significantly, Weinstein said. With the recent advancements in processing power, machine learning, and big-data management, the stars are now aligned for providers to expand the AI-based features within their platforms, Weinstein added. Also, cloud video vendors have access to a larger pool of experienced AI developers than ever before, which also supports their AI initiatives, Weinstein noted.
And of course, it’s all about the data, Dave Michels, principal analyst at TalkingPointz, told me. “AI thrives on data, and it [always] needs more data for context and … for learning” — the more data an AI engine has, the more easily it can find correlations and the further out I can predict events, he said. For instance, with advanced AI capabilities, a user could tell a voice-activated in-room video system, "call Ryan," and the AI could decide which of the hundreds of Ryans in the corporate directory makes the most sense based on who's making the request and previous interactions, Michels elaborated.
As Weinstein said, what makes AI different is its ability to make decisions. "AI is not just automating a function that a person would do; it's analyzing information, the situation, the context, and making some kind of decision based on that,” Weinstein said.
AI Makes Its In: Transcription, Translation Find Adoption
Though we might be far from the futuristic end of what AI can do for us in meetings, cloud video vendors have added a host of AI functionality to their platforms. As listed in the Omdia report, these include translation, closed captioning, virtual meeting assistance, transcription, voice control, meeting summary and follow-up items, the ability to capture action items, automation of meeting scheduling, people insights from social media, and tagging discussions and participants.
Another common AI capability bringing value to the enterprise meeting experience is noise reduction and cancellation, which use algorithms to improve audio quality or reduce background noises, Weinstein said. Cisco, Google, Microsoft, Zoom, and others have brought noise cancellation to their cloud video platforms in one form or another, he added. For example, Zoom developed noise cancellation for Zoom Meeting in 2018; Cisco last year acquired the technology from BabbleLabs for use in Webex meetings; and Microsoft added noise suppression to Teams last year.
Voice transcription is big as well, Weinstein said. Similarly, Cisco tucked transcription capabilities into Webex from its acquisition of Voicea in 2019, Microsoft rolled out meeting transcriptions for Teams last year, and Zoom announced its Zoom Meeting transcription capability at Zoomtopia 2017. In addition, users can tap transcription services from third-party vendors like
Also, worth noting are virtual and blurred backgrounds, which changed the game at the start of the pandemic as people newly working from home worried about revealing more about themselves than they wanted, Michels said. The issue of employee privacy has been raised by many inside and outside the enterprise, and the ability to throw on a virtual background or use a background blur is a simple way to add some privacy, he said. At this point, many cloud video providers have either added or support virtual and blurred backgrounds, and Microsoft and Zoom have taken the idea a step further — the former with Together Mode and the latter with Immersive View, Michels noted. Together Mode and Immersive View both provide users with the option to meet in a shared virtual background.
Together Mode in Microsoft Teams

Together Mode in Microsoft Teams allows teams to meet in a shared virtual space. 

Source: Microsoft Image Gallery
“Captioning and translation really are important in terms of inclusion,” Michels said. Users who are hearing impaired can read along with real-time captions, and real-time translations can reduce the language barrier, he said.
One enterprise finding value in real-time translations is toy manufacturer Spin Master, as Tyler Pollard, head of digital collaborations for the company, explained during a No Jitter interview. Like many other enterprises, Spin Master had to embrace a new way of collaborating during the pandemic, and turned to Zoom’s cloud-based video service. Since not all employees are native English speakers, the real-time translation capability is helpful in allowing them to follow a meeting in their native languages. Spin Master is also exploring more ways to leverage real-time translation for their town hall meetings, Pollard said.
Additionally, Spin Master users have been taking advantage of AI-generated transcripts, Pollard added. However, “there's a bit of a fine line that we're walking right now” when it comes to video AI, he acknowledged. While out-of-the-box features like real-time translation and transcription are currently providing value to Spin Master, Pollard raised concerns about cloud AI capabilities and security, namely how providers store and use data.
Videoconferencing AI: The Sky’s the Limit
Though AI-based meeting features are plentiful in cloud video today, we are nowhere near to the summit of what’s possible, as Michels pointed out. For example, calendars and scheduling, an almost entirely manual process today, are ripe for AI, he added. “Why can't my technology figure out that I'm late to a meeting and tell people …. and automatically reschedule things for me in the background?” he asked, sharing his frustrations.
Ultimately, just as the Nest thermostat “looks at how you live and tries to accommodate,” AI in cloud video one day will be able to do the same thing, looking at past behaviors and act accordingly, Weinstein said. Also, AI should be able to fix little annoyances within the workday, like checking if a do-not-disturb setting wasn’t left on by mistake and lowering the volume of incoming calls if you are in a meeting, he added. “AI should be serving us, saving us time, saving us stress, keeping us from making mistakes — not thinking for us — [but] doing work for us,” Weinstein said.
As cloud video providers keep taking incremental steps toward the summit of AI, meeting services will continue to become more natural and intuitive and be “far more inclusive; everyone has a great seat,” Michels said.
The danger for enterprises is not to get lost in an avalanche of features. Rather than worrying about the latest and greatest, Weinstein advised, focus on the AI features that you know will bring value to your enterprise. “Having a system email me a transcription after every meeting is wonderful, except if I don't need or want the transcription, in which case it's irrelevant,” he added.
That messaging no doubt resonates with Spin Master, which is still exploring how best to use more advanced AI features beyond real-time transcription, Pollard said. "AI definitely provides a lot of value to Spin Master, but I think it's something that we definitely really need ... to investigate and really need to understand the entire workings of it.”