No Jitter is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Conversations in Collaboration: Cisco’s Javed Khan on AI Working Quietly Behind the Scenes

As one of Enterprise Connect 2024's keynote speakers, Senior Vice President and General Manager of Cisco Collaboration Javed Khan gave a forward-looking address on how Cisco sees AI augmenting and extending collaboration – with a focus on sharp transcription and nimble, responsive assistants. 

We sat down with Khan and talked about Cisco's strong background in audio and its unobtrusive use of AI in its transcription features, and how these have set the company up for the AI assistants it's offering in Webex and in its contact center products. Khan also stressed that for AI to work well, it must do so seamlessly and transparently to make the end users' work easier – something that is top of mind for many contact center managers today.

Below is a transcript of the part of our conversation focused around AI assistants and the contact center. 

No Jitter (NJ): The first thing I really wanted to ask you about was Webex AI assistant for meetings. What do you think it does better than any of its perceived counterparts? 

Javed Khan (JK): For more than three years, we've been transcribing our meetings using audio intelligence. We are now building the AI system on top of that. Step one is getting a clean transcript, then you derive insights. I think the driving insights part, in some ways, is becoming easier because I pick your favorite language model trainers. And that too, we've been doing for a while. 

A lot of people don't know it, but we had a large language model [LLM] in our transcriber back in the day because sometimes you may not get a word but the LLM will fix the resolution. So, we have the full end-to-end system: capture the voice cleanly, transcribe it, then apply various models. 

Our assistant has a real-time media model, so it's not just capturing the voice piece and the text piece of a conversation but also the other visual and audio interactions. 

NJ:Because if you just have an LLM that's based on text and not voice, saying something like, "Oh, I'm just delighted," doesn't read the same as (sarcastically) "Oh, I'm just delighted." So you have to have that extra layer of data on there?

JK: That is audio and video intelligence. We put our features out there regarding data and unless it gets to a certain level of adoption we keep iterating on it, and we have this with our transcription and background noise. It has to be seamless and transparent. If you have to click a bunch of buttons to actually make it work, people won’t use it, so it has to quietly happen in the background. It has to delight you along the way. You shouldn't have to say, "AI, make my life better."

NJ: To make sure I understand: you're saying the thing that sets Webex assistant apart – and all of your AI in general – is that you've chosen to focus on audio quality and transcription. And that's because unless you have really good transcription, everything else won't work because it's not accurate, and then you can't train a language model and respond appropriately.

JK: Yes – on the audio side, [but it’s the] same thing on the video side. We want to decorate the text-based version of the interaction with audio and video nuances because those convey different meanings.

NJ: You've used the phrase 'a technology-delivered experience.' When you're talking about a technology-delivered experience, what is it?

JK: In the context of, let's say a contact center, we had an example where AI technology delivered experiences for the agent where the customer calls in and, because they had a certain profile -- e.g., 'the customer is unhappy' -- we rerouted them to an agent who's better equipped to handle them. Behind the scenes, the technology matched [the customer] to a better agent with a better answer, and the customer's experience of calling and getting help just got better.

NJ: One thing we see with a lot of vendors is an emphasis on customer experience and how it has to delight the customer and it has to be intelligent. But how are we applying technology so the experience of dealing with the customer gets better for agents, because the customer is not always a delight?

JK: The human interactions these agents have to deal with sometimes get ignored but the same technology applies. Just like you're trying to assess the sentiment of the customer, th wellbeing of the agent is [also] super important. [You also have to] make the tools available o can the [AI] assistant can listen in on the conversation and [suggest] a response. [By doing that], we make agent's life easier by saying here are three suggested options (for the customer). We can do this because of the audio and video intelligence – the tone of the agent and the tone of the customer is inputted alongside the words.

NJ: Because we're talking about you know experiences and especially human experiences here: How can AI help either deescalate or protect the agent's state of mind if they're just dealing with somebody who's determined not to be delightful to them?

JK: During the call, we have a continuous score that's generated – which is, this column is green because the conversation is going well, but when it goes bad, it turns red. [If that happens] either the agent or their supervisor gets a hit, they may even want to use the words, 'This conversation isn't going well, here's an alternative.' Then the agents and the supervisors can potentially tag in. What we've built in is [a prompt to] give the agent a break right now and it could be because of how they're feeling or because they just dealt with a difficult customer.

NJ: Burnout is a huge problem in the context center. The tools that you're offering are hopefully ways to try to mitigate that experience?

JK: Yes, the turnover inside the company – [the customer service] function is probably where the turnover is the highest. [Agent turnover] comes with other costs because then you have to train new employees and customer experience [suffers]. So, keeping [agents] happy and giving them the tools to do their job and then there's a summarization after the call – they don't have to sit and summarize the call, they can just edit the call and capture some nuance? That's where the job becomes more efficient.