What Do You Buy with 5k and AI?

Cisco has been actively marketing video solutions with 4k and 5k high-resolution cameras conjoined with artificial intelligence (AI) capabilities. While other vendors have high-resolution cameras, I believe Cisco is leading the market with respect to the capabilities it offers using 4k/5k camera data today coupled with information from other room devices. In Cisco's case, those would be Spark Board and Spark Room systems.

Quite frankly, most people can't tell the difference between a 720p video image and a 1080p video image, much less a 4k image. So why does Cisco think we need even higher-resolution -- 4k or 5k -- cameras?

I'll focus first on what Spark devices do with the 5k camera data, then I'll ideate on what we'll see in the coming months and years, not just with camera data, but from these systems in general.

5k Data Now

Cisco supports 5k cameras in the Spark Board 55 and 70, as well as in the Spark Room 55, Spark Room Kit, and Spark Room Kit Plus (with four 5k cameras). Today these systems primarily use 5k data to frame the video meeting properly:

  • The 5k camera data gives a wide view of the entire room. The systems (Spark Room today and Spark Boards soon) crop the image by framing active speakers in the field of view sent to the far side. The image size sent to the far side is still 1080p, not 5k.
  • The Spark Board has a 4k screen that avoids the "up close" pixilation one would otherwise see when a person on the far side is writing on the capacitive touch whiteboard.

AI in Spark Devices

The Spark devices use machine learning (ML), a branch of AI, to help them zoom in and out intelligently and to frame the images properly. ML algorithms process the 5k video images, allowing the systems to detect who is moving and who is speaking. While a speaker is in motion, the ML algorithms keep the speaker properly framed in the video sent to remote participants.

When multiple participants are speaking, Spark devices know to zoom out. During a meeting, Spark devices measure "ball possession," learning who is speaking most using a combination of visual recognition, voice recognition, and triangulation of voice location. In an experimental mode, Spark Board uses a "grid of dots," eye and nose placement, and other techniques to identify the speaker or speakers -- and then change the focus based on talking pattern.

For example, if one person is dominant, the system can make that speaker more prominent, based on the ML algorithm. If two people sitting across the table from one another dominate the conversation, the Spark device is intelligent enough to frame them in the video image it sends to remote participants, as opposed to ping-ponging back and forth among those two most active speakers.

The four-camera version of the Spark Room Kit Plus uses one of the 5k cameras to frame the entire room while the other three do the zooming and framing on portions of the room. The same algorithms referenced above come into play here.

Continue to next page for "Looking Toward Future Capabilities"