Artificial intelligence was the tentpole for technology at Enterprise Connect 2023, leaving vendors with generative AI capabilities to wrestle for the spotlight with stylish demos, comprehensive claims and the usual dose of marketing hyperbole.
But even as Amazon Web Services, Microsoft, Zoom and others tried to get the AI edge, Google Cloud offered up the broadest vision of what an AI-powered call center could look like. In his keynote Wednesday morning, Behshad Behzadi, VP of engineering and conversational AI for Google Cloud, showcased call center capabilities that ran the gamut of everything from building its own apps to deploying remarkably human sounding text-to-speech and other generative AI capabilities.
Behzadi pointed to Vertex AI, which consolidates existing Google Cloud services under a single, unified platform. Application software models using machine learning (ML) can be deployed to the same endpoints; from there, elements like vision, video, language translation and natural language ML can be embedded in new or existing applications. He demonstrated the generative nature by playing two audio files, the first, someone reading, another playing a piano. After just a few seconds of audio, the generative AI took over, filling in with its own words or melodies and harmonies while still sounding thematically consistent and accurate.
In tandem, Behzadi described generative AI app development using large language models (LLMs) that can create images, captions, and image classifications in a variety of media from plain text, audio, images, and more. “This takes problem solving to the next level, unlocking a plethora of use cases for the enterprise,” Behzadi told the Enterprise Connect audience. “More is coming.”
Case in point: Vastly improved text-to-speech capabilities. In another demo, Behzadi played two audio files from the stage and asked the audience to guess which was the real human voice and which was synthesized. The majority of audience members got it wrong (including your humble reporter).
He also showed how AI-generated voice is getting more natural as emotion – and translation – start to enter the picture. AI-generated speech can be made to sound casual (“Um, are you there?”), given a more active manner (“Take advantage of this price today!”), and even sound apologetic (“Sorry, but you’re out of credits.”)
Blend these capabilities with language translation in something Behzadi called “polyglot custom voice,” using Google’s recently announced universal speech model that allegedly recognizes 1,000 languages. That in turn enables text-to-speech that can be rendered in multiple languages for the same group or conference call, for example.
“Conversational AI is actually happening now, with multimodal sensing, intelligence, thinking and understanding with these generative capabilities,” Behzadi said.
What does all this mean for contact centers? “We’re trying to bring generative AI across all the pillars of [Google’s] Cloud Contact AI,” he added. With virtual agents, and agent-assist, there is a smart chatbot shadowing live agents, gathering insights and identifying questions and topics to add to the virtual agent’s behavior. “That’s right out of the box, no extra integrations,” Behzadi claimed.
He also cautioned that while the generative AI capabilities are well suited to call center requirements, they’re not useable for the enterprise – at least not yet. “To use LLMs for the enterprise, you have to take into account data privacy and compliance,” he said. The platform needs to be securely connected to backend systems, and enterprises need to be able to make immediate improvements and fix any problems that arise quickly. Such platforms must also be grounded in the organization’s data and able to follow its business logic, not create a new one, he added.