Google has been busy upgrading its Cloud Speech API to better meet a growing imperative for converting speech to text at cloud scale for improved user experiences.
Google previously used the speech-to-text technology exclusively in Google Now and other of its applications, as well as for search. But last spring the company opened up its speech technology to the masses, beginning with alpha testing of the Cloud Speech API in April 2016 and moving to GA this past April.
"Through the cloud, we're democratizing speech for businesses, from small startups to large enterprises," said Dan Aharon, Google Cloud product manager.
Thousands of businesses already are using the Cloud Speech API, and the number is growing every month, Aharon said. Based on customer feedback Google has been collecting over the past year through various channels, today it announced a few enhancements meant to address common demand.
The enhancements come in three parts. First, Google has added timestamp information for each word in a transcript -- the No. 1 most requested feature, Aharon said. Timestamps on a word-by-word level allow users to jump to the moment in the audio where the text was spoken, or display the relevant text while the audio is playing, he explained.
A couple of businesses already making use of this functionality are Happy Scribe, which uses the API to provide a voice-to-text transcription service, and Voximplant, which uses the API to help other companies build voice and video applications, Google announced. In prepared statements, both companies have commented that this feature allows them to save considerable time in working with speech-to-text transcriptions and pass along that time saved to their business customers.
In addition, Google has extended the length of the audio file supported. Cloud Speech API previously worked with audio files than ran no longer than 80 minutes, but now supports files of up to three hours in length. Further, customers who have even longer files can apply for a quota extension through Cloud Support, for approval on a case-by-case basis.
Lastly, Google has added support for 30 additional languages, including Bengali, Latvian, and Swahili, bringing the total to 119 and representing a billion additional potential users, Aharon said.
A motivating factor for the language extension is to aid in the economic development of more countries, Aharon said. By exposing an additional billion people to the product, Google is hopeful it can help existing customers reach new audiences and expose people in those countries to technology previously unavailable to them, he added.
In a demo of the technology Aharon gave, Cloud Speech API performed in real time to 100% accuracy.
Going forward, Google will have additional speech technology-related news, including market innovations, later this year, Aharon hinted.
Overall, Google is working on "strengthening our cloud portfolio with an eye on bringing AI to customers," Aharon said. He explained that there are two main spaces Google sees for the use of AI through speech recognition: human-computer interaction and speech analytics on human-to-human interactions.
Regarding AI, Google sees a wide range of enterprise readiness, which is often dependent on how regulated an industry is. Highly regulated businesses like those in healthcare and financial services tend to be more conservative or laggards, while startups and those in less regulated spaces are more likely to be first to market with AI-enabled solutions. In particular, Google sees high demand for AI and speech technology in the contact center, being leveraged to improve the customer experience.
Follow Michelle Burbick and No Jitter on Twitter!
@nojitter
@MBurbick