No Jitter is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Nvidia’s Maxine Democratizes AI for Collaboration


An AI teaching a team of employees
Image: nanuvision -
GPU leader Nvidia's annual GPU Technology Conference (GTC), this year virtual, is always one of my favorite events, as it has become more of an industry show rather than a vendor conference. And for the past five years, AI has been a key focal area for GTC, and it was again, but this time Nvidia is bringing AI to the collaboration industry.
At this year's event, Nvidia announced its Maxine platform, which is a cloud-native suite of GPU-accelerated AI services designed to make video collaboration better. The use of video has exploded, as the COVID-19 pandemic pushed everyone into WFH. Cisco’s tag line used to be that the Internet changed the way we work, live, learn, and play — and that happened. Now, we are doing almost all of our working, living, learning, and playing over video. Video is so important that if Maslow were to redo the “Hierarchy of Needs,” video would be one of the basic needs alongside the physiological needs of food, water, and warmth.
Given the importance of video, video experiences must be great. Nvidia's Maxine is a set of services that any collaboration vendor can embed into their platform to improve the experience. Notable features for Maxine include:
  • Bandwidth efficiency — Everyone that does video at home knows the pain of poor-quality video because the kids are Zooming into a classroom, while one parent is Webexing into a company event, and the other parent is Avaya Spacing into an important meeting. Bandwidth is a precious commodity, and not every home has multi-gig fiber. Instead of streaming the entire screen of pixels, Maxine analyzes key facial points on each person and then intelligently re-animates the face on the other side. Also, anything that doesn’t move, like a background, isn’t continually sent. Nvidia claims that this feature will reduce bandwidth consumption on an H.264 video stream by as much as 90%. This dramatically cuts the cost of network transport and delivers a much smoother video stream on all devices, even low-end ones that might not have been able to do video before.
  • Face alignment — One of the challenges of doing video calls is that people need to remember to look at the camera and not the screen. Without doing that, it’s common to see the side of people’s faces, chins, and even top of heads. When the camera is down low, like on an IP phone, the viewer gets the dreaded “nostril cam” experience. Maxine’s face alignment feature automatically adjusts to maintain eye contact, even if the camera isn’t aligned correctly, which will help people stay engaged without having to focus on the camera.
  • Custom avatars — With this feature, participants can use an animated avatar with realistic automation driven by voice and emotional tone in real-time. This feature isn’t all that practical for business, but it can be a fun way of getting feedback in a group setting. For example, students might be more willing to participate, if they know the avatar would interact for them.
Some vendors have rolled out similar features to enhance the meeting experience, including backgrounds, auto-reframing, translation, and noise reduction. Since these features come from the cloud, endpoint won't have to process them. For example, Zoom was the first to roll out virtual backgrounds, but since its processor-intensive, it doesn’t work on MacBook Airs and other lower-end computers. By having these available in the cloud, it democratizes the features and makes them usable by anyone on any device.
There are also conversational AI features enabled by Nvidia’s Jarvis SDK. This enables developers to integrate virtual assistants that have been fully trained by Nvidia for speech recognition, language understanding, and speech generation. The virtual assistants can also take notes, set action items, and answer questions with humanlike voices. Other conversational AI services, such as closed captioning and transcripts, help with the post-meeting experience.
At launch, Nvidia announced Maxine would be made available from all the major cloud providers — Amazon Web Services, Google Cloud Platform, Microsoft Azure, Oracle Cloud, and Tencent Cloud. The cloud-native architecture enables massive scale for video meetings, many of which are now reaching hundreds or even thousands of concurrent people. The AI microservices that Maxine uses run in containers, making them portable across clouds and bringing feature consistency.
At GTC, Avaya announced it would be using Maxine to bring more capabilities to its Spaces video product. Avaya’s Spaces is a well-engineered product, and Maxine can bring features like noise removal and virtual backgrounds to improve the meeting experience. There is no requirement for customers to buy expensive cameras or other hardware, as the features are cloud-enabled. The decision to partner with Nvidia instead of building its own AI capabilities is a smart one for Avaya, as it brings best-in-class capabilities to Spaces quickly. It also opens the door to tapping into the tens of thousands of Nvidia developers that now have access to Maxine for additional features in the future.
Video communication is here to stay; it’s an essential application for almost everyone. Nvidia Maxine running on its GPUs in the cloud combined with its massive developer reach will accelerate the AI capabilities in the collaboration industry. Expect to see more innovation in this area from Nvidia in future GTCs.