Google Ramps Up Speech Capabilities for Enterprises

speechtech_774.png

Google today announced new features, support for more voices and languages, and lower prices for its Cloud Speech-to-Text and Text-to-Speech products.

Google has been steadily improving its speech technology offerings over the last several years. After opening up its Cloud Speech API to the masses in April 2017, the company upgraded the API with new features and language support in August 2017. Then in March 2018, it debuted Cloud Text-to-Speech -- technology that it had been using in its own products for years.

With the enhancements announced today, Google is aiming to make speech technologies more accessible for enterprises across the globe. To start, it has improved the accuracy of its Cloud Speech-to-Text capabilities. “Unfortunately, many companies build speech applications that need to run on phone lines and that produce noisy results, and that data has historically been hard for AI-based speech technologies to interpret,” Dan Aharon, product manager for Cloud AI at Google, wrote in a Google blog.

To address situations that produce “less than pristine” data, as Aharon wrote, Google has been working with beta customers’ usage data to refine the accuracy of its models. As a result of these beta tests, Google now offers an enhanced phone model that it claims has 62% fewer transcription errors and a video model with 64% fewer errors compared to its previously available technology.

Google is now allowing any enterprise to use the enhanced phone and video models without mandating that they also opt in to data logging, which had been a requirement during the beta testing. Customers that do opt in, however, will pay a lower rate.

Further, multi-channel recognition, which helps the Cloud Speech-to-Text API distinguish between multiple audio channels, is now generally available. This functionality is useful for call and meeting analytics. This and other new features qualify for a service-level agreement and other enterprise-level guarantees, Aharon said.

Reducing Costs

Another way Google aims to make its speech technologies more accessible to enterprises is with lower pricing. As mentioned above, customers that opt in to Google’s data logging program will receive a discount – 33% less -- for use of all standard models and the premium video model, Aharon wrote. Further, Google has cut pricing for its premium video model by 25%, which when combined with the discounted rate for opting in to the data logging program, brings the total savings to 50%, he said.

Text-to-Speech: Expanding the Scope

On the speech synthesis front, Google announced that it has doubled the number of overall voices, WaveNet voices, and WaveNet languages for its Cloud Text-to-Speech offering. For those who may need a reminder, WaveNet is Google’s generative model for raw audio.

Google Text-to-Speech now includes support for seven additional languages or variants -- Danish, Portuguese/Portugal, Russian, Polish, Slovakian, Ukrainian, and Norwegian Bokmal -- all of which are currently in beta. Once generally available, this will bring the number of supported languages to 21. For WaveNet, Google has unveiled 31 new WaveNet voices and 24 new standard voices across the newly supported languages, Aharon wrote.

Finally, Google has announced general availability of a Cloud Text-to-Speech Device Profiles feature, which optimizes audio playback on different types of hardware, Aharon said. “For example, some customers with call center applications optimize for interactive voice response (IVR), whereas others that focus on content and media (e.g. podcasts) optimize for headphones,” he explained.

Enterprises can view demos and give Google’s Cloud Speech products a try today. It’s offering the first 60 minutes of audio processing each month for Cloud Speech-to-Text for free, as well as a $300 credit to start testing.

Learn more about AI and Speech Technologies at Enterprise Connect 2019, coming to Orlando the week of March 18. Check out our AI & Speech Technologies track and all the sessions we are offering that week. If you haven’t yet signed up for your Enterprise Connect pass, register now to take advantage of our Early Bird Rate -- ending tomorrow! As a No Jitter read, you can save an extra $200 off your pass by using the code NJPOSTS at checkout!

Tags:

Google

speech-to-text

text-to-speech

WaveNet

News & Views

AI & Speech Technologies

APIs & Embedded Communications

Speech Technologies

Vendor News

Articles You Might Like

Why the AI Hype of 2023 Won’t Lead to a 2024 Full of Disappointment

Tom Brannen

April 16, 2024

AI will live up to its hype and may very well avoid the “Trough of Disillusionment” thanks to its ability to improve itself.

AI Gets Real at Enterprise Connect 2024

Eric Krapf

April 05, 2024

The show featured real-life examples of how to deploy artificial intelligence and get real value from it.

A healthcare technology graphic with a heart and apps

Intermedia Healthcare Solutions to Provide AI for Collaboration, Communications

Zeus Kerravala

March 27, 2024

The company’s new offering features AI-powered tools to better pinpoint issues that impact organizational efficiency and patient care.

Enterprise Connect Preview: The State of Enterprise Speech Tech

Jon Arnold

March 11, 2024

Some speech tech applications will be familiar, but newer, more transformative applications are on the way, and IT leaders need to consider the bigger picture here.

Search form