This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.
Google Ramps Up Speech Capabilities for Enterprises
Google today announced new features, support for more voices and languages, and lower prices for its Cloud Speech-to-Text and Text-to-Speech products.
Google has been steadily improving its speech technology offerings over the last several years. After opening up its Cloud Speech API to the masses in April 2017, the company upgraded the API with new features and language support in August 2017. Then in March 2018, it debuted Cloud Text-to-Speech -- technology that it had been using in its own products for years.
With the enhancements announced today, Google is aiming to make speech technologies more accessible for enterprises across the globe. To start, it has improved the accuracy of its Cloud Speech-to-Text capabilities. “Unfortunately, many companies build speech applications that need to run on phone lines and that produce noisy results, and that data has historically been hard for AI-based speech technologies to interpret,” Dan Aharon, product manager for Cloud AI at Google, wrote in a Google blog.
To address situations that produce “less than pristine” data, as Aharon wrote, Google has been working with beta customers’ usage data to refine the accuracy of its models. As a result of these beta tests, Google now offers an enhanced phone model that it claims has 62% fewer transcription errors and a video model with 64% fewer errors compared to its previously available technology.
Google is now allowing any enterprise to use the enhanced phone and video models without mandating that they also opt in to data logging, which had been a requirement during the beta testing. Customers that do opt in, however, will pay a lower rate.
Further, multi-channel recognition, which helps the Cloud Speech-to-Text API distinguish between multiple audio channels, is now generally available. This functionality is useful for call and meeting analytics. This and other new features qualify for a service-level agreement and other enterprise-level guarantees, Aharon said.
Another way Google aims to make its speech technologies more accessible to enterprises is with lower pricing. As mentioned above, customers that opt in to Google’s data logging program will receive a discount – 33% less -- for use of all standard models and the premium video model, Aharon wrote. Further, Google has cut pricing for its premium video model by 25%, which when combined with the discounted rate for opting in to the data logging program, brings the total savings to 50%, he said.
Text-to-Speech: Expanding the Scope
On the speech synthesis front, Google announced that it has doubled the number of overall voices, WaveNet voices, and WaveNet languages for its Cloud Text-to-Speech offering. For those who may need a reminder, WaveNet is Google’s generative model for raw audio.
Google Text-to-Speech now includes support for seven additional languages or variants -- Danish, Portuguese/Portugal, Russian, Polish, Slovakian, Ukrainian, and Norwegian Bokmal -- all of which are currently in beta. Once generally available, this will bring the number of supported languages to 21. For WaveNet, Google has unveiled 31 new WaveNet voices and 24 new standard voices across the newly supported languages, Aharon wrote.
Finally, Google has announced general availability of a Cloud Text-to-Speech Device Profiles feature, which optimizes audio playback on different types of hardware, Aharon said. “For example, some customers with call center applications optimize for interactive voice response (IVR), whereas others that focus on content and media (e.g. podcasts) optimize for headphones,” he explained.
Enterprises can view demos and give Google’s Cloud Speech products a try today. It’s offering the first 60 minutes of audio processing each month for Cloud Speech-to-Text for free, as well as a $300 credit to start testing.
Learn more about AI and Speech Technologies at Enterprise Connect 2019, coming to Orlando the week of March 18. Check out our AI & Speech Technologies track and all the sessions we are offering that week. If you haven’t yet signed up for your Enterprise Connect pass, register now to take advantage of our Early Bird Rate -- ending tomorrow! As a No Jitter read, you can save an extra $200 off your pass by using the code NJPOSTS at checkout!