No Jitter is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Conversations in Collaboration: Yellow.ai’s Raghu Ravinutala on Automating Customer Service to Generate Real Cost Reductions

Bots_2DC5N4X_8422.jpg

An AI chat bot
Image: Andrey Suslov - Alamy Stock Vector

Welcome to No Jitter’s Conversations in Collaboration series. In this current series we’re asking industry leaders to talk about how AI can boost productivity – and how we define what productivity even is – with the goal of helping those charged with evaluating and/or implementing Gen AI to have a better sense of which technologies will best meet the needs of their organizations and customers.


In this conversation, we spoke with Raghu Ravinutala, the CEO and Co-Founder of Yellow.ai about how automating customer interactions with a generative AI-powered virtual agent can improve CX and how enterprises can evaluate the ROI of this type of solution.

Yellow.ai (formerly known as Yellow Messenger) is a global generative AI-powered customer service automation company. Since its inception in 2016, Raghu and his team have expanded Yellow.ai’s presence across North America, Asia Pacific, the Middle East, Europe, and Latin America, with 1000+ customers in 85+ countries. Previously, Raghu worked for 16 years with tech companies such as Texas Instruments and Broadcom across leadership roles in Engineering, Product Management and Business Development both in the Bay Area and India.

No Jitter (NJ): Let’s start with a 30,000-foot view of Yellow.ai and your approach to the market.

Raghu Ravinutala (RR): We are in the business of making customer service autonomous. We believe that a significant part of the enterprise workflows with their customers employees will happen on messaging, so we built a platform to help enterprises leverage that.

Today we automate more than 4 billion interactions per quarter with more than 1000 enterprise customers across the world who have primarily adopted us to automate customer service. Our product strategy is evolving to include an AI-centric end-to-end customer support suite that has automation and agent workflows unified into a single system so that there is continuous learning that [in turn] helps drive higher automation rates.

NJ: Is the Yellow.ai solution similar to an IVR or virtual agent where it is deployed in front of the contact center so that customers interact with it first and then get passed through to an agent?

RR: That is correct, but the biggest difference between [us and an] IVR is that we’re not a decision tree – we’re purely conversational. Our virtual assistant helps route calls but also automates anywhere from 30% to 80% [of interactions] depending upon the use cases and the kinds of customers. [It will] then transfers customers to humans whenever the virtual assistant thinks that it’s necessary.

NJ: How is that virtual agent trained up?

RR: It is trained in three different ways. The first is by configuring the workflows or conversations that the [virtual] agent needs to interact [with customers]. That could be from a series of PDF documents or websites or other information. The second way is by using conversations the company had before – call center logs, chat logs, etc. The third and probably the most important one is integrating [the virtual agent] with the core systems of the company, so that it can take actions and not just respond to queries for user requests.

NJ: How does how does the enterprise or the call center operation create that flow?

RR: For the contact center, the first step is to identify the knowledge bases that serve and automate significant parts of their workflows. That eliminates a lot of informational queries because it's done using generative AI. We use pre-trained models and retrieval augmented generation to answer queries that are grounded on certain documents to drive responses for those [informational queries].

The second step is to identify the transactional use cases where [the virtual agent can] complete the conversation or transaction – this is where [the virtual agent] needs to initiate action or a workflow that involves multiple internal systems. For this, we have a UI-based builder that takes the API and builds the workflows that needs to be triggered or specific actions to be taken.

Third, we have an intent-based system. But that really is the third priority. This is when the system is not able to answer using the generative AI front end. This intent-based system [provides] structured responses that need to [be provided] to the customer. It’s also possible to combine the generative AI-based strategy with this intent-based system.

NJ: Is your solution completely Gen AI?

RR: The core is generative AI [and] we supplement the generative AI strategies with machine learning-based predictions.

NJ: What impact does the Yellow.ai solution have on the contact center?

One of the things we’ve found is that our customers are most concerned about the average handle time for their end customer and not for the agent. Using generative AI and [our] virtual assistant, we’re actually decreasing [the former] time very significantly; we are seeing the average [handle] times for the end customer reduce very significantly, even up to 80 to 90%. It takes humans agents a lot of time to figure out the various options and provide those to the end customers, whereas a virtual assistant can make the API calls, create the necessary options, and drive the conversation much faster.

Even when the conversation is shifted to the human agent, many companies are adopting something called agent assist, which essentially is a virtual assistant that is agent facing. It can help the agent get the information they need to the solve the query very quickly. And with [that information], the agent processes that response and relates it back to the end customer. [This is another area where] we are seeing the average handle times come down.

NJ: Many calls into the contact center are on the same or similar issues. Is there a way to create a database, so to speak, of responses that are generated by the LLM so that rather than continually hitting the model to generate basically identical responses over and over, the system instead pulls from that database of known answers to known questions?

RR: Our earlier AI technology used to do that – [it had] preset responses and you could fill in the values based on the user interaction, the intents and entities and then provide responses based on those.

[But] the whole purpose of generative AI is to further personalize and further contextualize the responses to the customer need. So even if the query from one customer is similar to another customer, the context, the history, and the persona of the customer is different. Generative AI enables us to dynamically create prompts that generate a reasonably personalized and unique response for each customer based on not just the query but other historical information data and their persona. That's the whole advantage of generative AI.

If a company specifically determines that it wants to be very prescriptive in responses to certain kinds of questions or doesn’t want to take a risk with generated content, that's when we use a hybrid approach and we use an intent-based model to answer those queries. But the majority of the responses are dynamically generated.

NJ: How does your pricing model work?

RR: Our pricing is based on automated minutes and on the volumes that the customer can commit to. For example, let’s say a company receives a million calls per annum. And on average its three minutes per call, which is three million minutes. Out of those minutes, if we automate a million minutes and we charge, let’s say, $0.50 per minute then you’re potentially looking at $500,000 per year. That’s an example of how our pricing works.

NJ: The term guardrails is often used in the CX context as a shorthand for making sure the generative model does not hallucinate, provide fake information, or produce toxic responses. What systems do you have in place on your end to prevent that from happening?

RR: That is the core need of how a generative AI application is prioritized at an enterprise. For an enterprise use case I believe you need a high-fidelity model that is 99.99% [accurate]. The way we have achieved that – and we put out a public paper on this – is by using a multi-LLM architecture where we have certain small language models that are fine-tuned and trained for the use cases of an enterprise and are exclusively trained on their data. Then we use retrieval augmented generation on top of it so that we are grounding the responses and interactions to the [enterprise] data that is exposed to these LLMs.

For example, say there is a banking query around the outstanding balance of a loan. You want that to be high fidelity. There is no leeway for creativity in that use case.

But for the same banking customer, if you’re trying to convert them to a loan offer by saying something like “this is an attractive loan offer that will help you retire your existing loans,” the language that can be generated could be based on the persona and [might be] prone to a little creativity.

The [enterprise must] decide on which of the various ways they want to use language models – the highly trained, narrow models versus use cases that would need the big foundation models to be triggered. So that broadly covers the guardrails.

The second aspect of managing this system is our audit framework. We have a model that essentially audits the responses [generated by the other models] using a completely independent model. That gives a mechanism for the enterprise to evaluate the quality of the interactions and flag them so that the enterprise or someone on our team who’s working with them can look at those responses and make the necessary corrections to the core model to not allow [those incorrect responses].

So, this combination of one multi-LLM architecture, with small language models, focused models and an independent model for audit are how we implement the guardrails around generative AI.

NJ: What LLMs are you using?

RR: The fine-tuned models are proprietary, but the underlying [foundation] models for fine-tuning [those proprietary models] are open source. [More specifically], we use a lot of Llama models and then fine tune for use cases on top of [them]. [We also use] OpenAI models for certain specific use cases that actually need them and then [we] can have a lever for more creativity. But for real high fidelity use cases, we exclusively use the fine-tuned models that are fine-tuned based on the open source Llama models.

NJ: How do enterprises evaluate the impact your solutions?

RR: The main way they’re looking at measuring the impact of using AI is how they improve the first time to response. Typically, when a call center is staffed by human agents there are times when customers are on hold or maybe on nights or weekends there are no agents available. A [virtual agent] directly improves first time to response because it can handle millions of calls simultaneously without any lag.

The second measure is the automation rates. That is, how many of the calls or interactions need a human-based agent involved to drive the resolution? That [can] range from 30% to 80%, so if a company was getting 100 calls, for example, automation would handle some percentage of those.

The CSAT of the response and interaction is the third measure [and that’s where] companies are ensuring that while they drive these automation rates, they make a positive impact on the customer experience rather than degrade [it].

Those three [metrics] form the core of how a virtual agent is primarily evaluated. But there are secondary measures such as the repeat rate, which is when they interact with a virtual agent, but do they come back to it or are they going on to a different channel? Another is how contextual and accurate the responses are.

There are other measures, some are leading or lagging indicators, but CSAT, automation rate, and first time to response are the core of how a contact center evaluates virtual assistants.

NJ: How are enterprises evaluating the ROI of deploying this type of technology into their contact center?

RR: They're looking at three ROI parameters. One is outright cost reduction – that's so important in the current macroeconomic climate. They're looking at the millions of dollars that are essentially automated away in terms of their contact center spent. So in that earlier example of three million minutes that are handled by humans and [if] automation can “automate away” one million of those minutes, they are looking at a one-third cost reduction in their contact center spend.

The second area is increasing revenue, especially when these contact centers are handling orders or taking leads for their products, etc., where they were missing orders because they had inelasticity in the contact center [during] peak times. In those cases, they are looking at incremental revenue generation through intelligent virtual assistants.

Third, ROI is around their ability to provide high quality support for a larger number of their users. Customer service and customer engagement are sometimes buried deep within a website, [but with automation] they can open those [pages up] and engage with more of their customer base and [on many] more use cases. Here they are looking at improved customer experience measured in terms of their overall CSAT for the end customer.

An example here is Logitech who is one of our [virtual assistant] customers. Their knowledge base provides information to end customers and helps them resolve the problems they’re having. This is directly related to the contact center because [the self-service] reduces the volume of calls they receive because the end consumers are resolving queries around the product by themselves.

Logitech knows it – there is a direct correlation of fully completed knowledge-based queries and the [lower] number of contacts they receive through the contact center. For many customers, a core metric is how many users visiting the website end up with a satisfied resolution [because of] the virtual assistant [deployed there]. That directly correlates to the number of calls that they would get on their contact center.

So [I’d say] those three are the top ROI parameters customers are looking at, but cost reduction is absolutely the [most] prominent one.

Want to know more?

Yellow.ai referenced this blog post article about bi-encoder-based detectors (BEDs) which are used for out-of-distribution (OOD) detection. Yellow.ai then provided this commentary to explain how BEDs are relevant to the multi-LLMs referenced above:

  • OOD detection is critical as it prevents unreliable predictions or erroneous decisions when machine learning models encounter data points that fall outside their training data.
  • OOD detection also helps filter out irrelevant or potentially harmful input, enhancing the reliability and safety of conversational AI systems.
  • Additionally, OOD detection helps in the generalization and adaptation of natural language processing (NLP) models. By recognizing when a new input does not belong to its dedicated distribution, models can adapt their behavior based on this detection. This allows for better handling of unseen or novel data, improving the model’s ability to handle diverse inputs effectively.
  • All of this applies to LLMs in general and BED helps to achieve <1% hallucinations. It allows the system to flag inputs that fall outside the distribution of its training data, thereby preventing the generation of unreliable or fabricated responses.

These are links to Yellow.ai’s: terms of service, privacy policy, and security compliances.

For more on retrieval augmented generation (RAG), check out these articles by AWS, Nvidia and IBM. In short, RAG is a technique to enhance the accuracy and reliability of generative AI models by referencing an authoritative source (e.g., the enterprise’s own databases) outside the data it was trained on before generating a response. The IBM article makes an important point: “LLMs know how words relate statistically, but not what they mean.”