How AI May Be Creating the Future of Video

metamorworks_AdobeStock_170382451.jpeg

Image: metamorworks - stock.adobe.com

For a glimpse at the future of enterprise video technologies, it can be a good idea to watch the journey of software engineers like Huipin Zhang.

For years, Zhang has worked on the frontiers of video collaboration in the enterprise. After a stint helping Webex create its first video capabilities more than a decade ago, Zhang then became the first employee hired by Zoom founder Eric Yuan. As Zoom’s Chief Scientist, he played a key role in building a hosted video meeting solution that prioritized ease-of-use above all else.

Indeed, it can be argued that Zhang has done as much as anyone in the industry to facilitate the development of hosted video collaboration solutions for the enterprise. So, consider my surprise when I caught up with Zhang recently and discovered that he’s moved on from enabling online video meetings to the emerging realm of AI-infused “intelligent video” applications.

Zhang has launched a company of his own, Visla.

Long before ChatGPT emerged as the darling of the artificial intelligence age, Zhang had held visions of leveraging AI to develop a software solution that would make video publishing as simple as scheduling a Zoom meeting. He actually left Zoom more than three years ago to launch Visla - several months before anyone had even heard of COVID.

The timing was right for Zhang to set his sights on what he saw as the next “big thing” in enterprise video. His mission was to apply Zoom’s penchant for ease-of-use to a new challenge on the video frontier: content creation.

Since that time, Zhang and his small team at Visla, have quietly buried themselves in product development work. The primary focus has been on finding ways to leverage artificial intelligence capabilities in ways that can simplify - and automate - the process of producing video.

“The vision for Visla is still the same we had at Zoom,” Zhang said in a recent analyst briefing. “We want people to use video to solve their business problems.”

Just now, the fruits of Zhang’s handiwork are starting to come out of the shadows. Less than a month ago, Visla launched its first efforts to draw public attention to a platform solution that is the product of more than three years of development work. While Visla has attracted nearly 300,000 users for the free version of its video content creation platform, Zhang – until now, at least – has been largely reticent to promote Visla or to talk extensively with industry analysts about his start-up’s initiatives.

The relative silence belies the potentially transformative work taking place far away from the industry spotlight. Part of the reason for that may be that the development work has not been easy. Plowing new ground in developing AI-infused video platform solutions is not necessarily a straightforward, linear process.

“We look back and laugh at ourselves,” Zhang says. “We’ve wasted some time building technologies that didn’t work very well.”

But taking the time to wander down blind alleys of product development ultimately pays dividends, Zhang says.

“When I look back, it’s not really fair to say we ’wasted time,’” Zhang says. “Everything we did in the past leads to building something better later on.”

Visla’s current iteration of “something better” is a video content creation platform that leverages artificial intelligence to streamline the development of videos from scratch and combining that capability with tools that then enable video creators to add their own customized flourishes while working their way to a final product.

Using the Visla platform, it actually is possible to create a video from a single text prompt. One phrase such as “top tips for auto care,” can spark the AI-driven creation of an entire script on the subject. Visla’s system then selects the images, long-form videos and video B-Roll material from licensed public content libraries - or from a user’s private archives that have been analyzed and tagged by the Visla system - to match themes identified in the script. The Visla platform then combines the visual content with a voiceover of the script content automatically generated by AI systems. The platform can even be instructed to select and insert background music for a given video.

While the Visla platform is able to leverage publicly licensed content, it also offers the capability to create content based on assets stored in an organization’s private archive. The ability to draw upon proprietary content can help the Visla platform generate more accurate voiceover scripts and incorporate video that uniquely mirrors an organization’s branding look and feel.

In many cases, the initial videos produced by the automated system will be suitable for sharing with external audiences. But producers also can further polish the video from the system by editing it using an integrated timeline feature that makes it possible to swap in video assets of their choice to replace those selected by the AI system. The platform also enables users to add in more customized video, edit scripts to update voiceover content and tweak on-screen titling and graphics where needed.

The basic premise, Zhang says, is to provide video novices a starting point for creating good-looking videos. “Creating content from scratch is difficult,” he says. “The best way to create video is to have video.”

Zhang sees the potential for automated video creation tools like Visla’s to be put to work in a variety of business use cases. With AI-infused video processing solutions, it is possible to transform any information conveyed in text form into high-quality video content. Companies using the Visla platform, for instance, can draw upon information developed for use in customer service centers to create videos that address questions frequently asked by customers. Similarly, written product descriptions can be leveraged to create videos used in e-commerce settings. Even employee training manuals could be converted into short-form videos designed to answer specific questions from workers.

And, as it turns out, the best way to facilitate this type of assisted video creation for each of these applications is to leverage the computing power typically associated with AI.

In years past, recorded videos would sit in corporate archives, largely unviewed. Without automated tagging systems, it has been difficult for workers to find relevant video passages on short notice. Essentially, video archives have been the place where corporate knowledge goes to die.

As a result, while real-time video collaboration has proved to be a boon to remote workers trying to stay connected and engaged, on-demand video has been unable to gain traction due to the inherent challenges that come with dealing with video data in legacy computing environments.

Zhang – and Visla, by extension – herald a coming sea change in how video is created in enterprises. As video collaboration becomes more and more of a commodity, AI-infused “intelligent video” applications will emerge to unlock video’s value in new and imaginative ways.

For decades, the value of computer applications has resulted primarily from their ability to manipulate and manage numbers and text. Early spreadsheet applications paved the way to managing numbers in a new way. Similarly, word processors helped facilitate a whole new way to produce and manipulate text.

AI transforms video into a type of data that is as digitally nimble as text and numbers. As such, it’s fair to consider AI-infused content creation applications like Visla to be among the industry’s first “video processors.” Much as today’s Microsoft Word enables individuals to easily cut and paste text and images into documents, video processors powered by AI pave the way to leveraging existing videos to create new, informative content.

And, of course, that video will be of more value to enterprises than ever before as AI-infused content management solutions make it possible to track and tag content more effectively, enabling more precise extraction of corporate knowledge from video archives. In addition to leveraging AI to help in content creation, the Visla platform also can be used to automatically generate tags that help management systems find the right passage of video for viewers when they need it.

It all sets the stage for a virtuous cycle of accelerated enterprise video adoption. An emerging class of “video processors” simplify the process of creating high-quality videos, boosting the number of on-demand videos in archives. Meanwhile, AI-infused content management solutions help organizations sift through these ballooning archives to retrieve the relevant information from videos at the right time. As more knowledge is successfully shared via video, more workers will find value in creating videos of their own, boosting the size of video archives even more and, again, renewing the information-sharing cycle.

Just as Zoom faced competition in the ascendent days of the video collaboration market, Zhang is well aware that dozens – if not hundreds – of other vendors will ultimately bring their own AI-infused solutions to the video creation marketplace over time. The key for Visla, he says, will be to mimic Zoom’s ethos for ease-of-use and find ways to deliver solutions that hide the complexities of AI implementation under the hood.

“People may not realize that the reason Zoom is so easy to use is because of what we invested in the underlying technology,” Zhang says. “If you make the tool easy to use, people will use the tool in creative ways.”

Steve Vonder Haar is a senior analyst with IntelliVid Research, responsible for coverage of the “Intelligent Video” market. He can be reached at [email protected]. Visit www.youtube.com/@IntelligentVideoToday to subscribe to IntelliVid Research’s on-going series interviewing thought leaders from companies developing video solutions designed for corporate use.