ABOUT THE AUTHOR


Mike Bergelson
Mike Bergelson is responsible for developing new product and business model strategies for Cisco's Unified Communications portfolio. Prior to this...
Read Full Bio >>
SHARE



Mike Bergelson | July 25, 2010 |

 
   

The State of Transcription for UC: Part 2.3: Areas of Innovation

The State of Transcription for UC: Part 2.3: Areas of Innovation Medical transcriptions and closed captioning are likely areas for growing benefits as technology and human processes improve.

Medical transcriptions and closed captioning are likely areas for growing benefits as technology and human processes improve.

This is a continuation of my blog post from last week, part of a series of posts on the application of transcription in unified communications.

In this and the previous two posts, I discuss the state of transcription today. In the final post in the series, I'll address where I believe the market may be going and some key areas of innovation that can help us derive more benefit from recorded audio and video content.

Medical Transcription
The medical transcription market is enormous--around $20–25B globally and is expected to grow around 15–20% per year for the foreseeable future.

As most people know (or could guess) the traditional model of live agent transcriptions is slowly giving way to a semi-automated solution where system-generated transcriptions are edited by humans. Research suggests that agent productivity increases 30–50% (some vendors claim that productivity doubles) with a first pass by an automatic speech recognition (ASR) engine, further underscoring the economic benefit of this approach.

Clinicians are also using speaker-dependent ASR engines to obviate the need for outsourcing altogether, although the growth of this approach has been relatively slow as clinicians perceive a high set-up cost (both the user and language models must be trained) and may not be willing to commit to the necessary behavior changes.

To wit, a Nuance Communications employee concedes that it is "often more important to train users how to use speech than it is to train speech systems how to recognize users" in his thorough response to a thought-provoking blog post by Robert Fortner cleverly titled Rest in Peas: The Unrecognized Death of Speech Recognition.

In large part, the dollars attached to creating efficiencies in the medical transcription market will inure to the benefit of UC transcription solutions since cost, turn-around time, accuracy and privacy are all major issues for medical transcriptions. Examples of applicable innovations include passive speaker-dependent language model training, the use of multiple speech engines to increase accuracy and improved workflow for human editing.

Closed Captioning
Recorded video is quickly becoming a common medium for intra- (e.g., training) and inter-enterprise (e.g., marketing) communications. The same benefits--speed of consumption, improved retention, searchability, etc.--that users experience with transcriptions for recordings of live events can be found with "canned" or made-for-video content.

There isn't too much demand for real-time transcription in the enterprise context (there are use cases around company-wide meetings requiring real-time translation), but the approaches applied help guide some thinking that I'll revisit in my final post in this series.

Traditionally, real-time closed captioning is created in a two part process. First the dialog is converted by a stenographer (think the person who's asked to read back the testimony in your favorite courtroom drama) into a phonetic representation of what's been said. Stenographers and closed captioners routinely record dialog at approximately 200 words per minute (as one would expect given that this is the upper bound on typical speech rates).

The output of the stenotype machine is then fed into a system that converts the phonemes into actual words. Inaccuracies in this process account for the odd words that we see in the closed captioning ribbons from time to time on TV screens in public places (or at home if we rely on closed captioning).

In some cases, as with the BBC, agents with crisp enunciation actually re-speak what's being said into stenomasks (specially-designed masks with a microphone embedded inside that cover one's mouth to block outside noise). This parallel dictation is then fed into a speaker-dependent ASR engine to produce the near-real time transcription with high accuracy.

Interestingly, closed captioning in the US and UK appears to be used most often (by a factor of four to one!) by viewers for whom English is a second language rather than by the intended audience--those with hearing impairments. As we start to transcribe video and audio in the enterprise context, I believe we can count on similar examples of "unintended benefits."

Google made big news in the closed captioning world (as with voicemail transcription) by announcing an Automatic Caption Feature for YouTube videos in November 2009. In a clever (or just honest) move, much of the messaging around this feature anticipates the inaccuracies of the machine translation and re-focuses attention on the important benefit of making video content accessible to hearing impaired viewers around the world.

The clever use of speaker-dependent speech engines and crowd-sourcing create some interesting possibilities in other areas, as we'll explore in my next post.



COMMENTS


Who's Who at Enterprise Connect

NEC

Featured This Week:
Sponsored By


May 7, 2014
With the imperative to drive innovation and continuously improve business processes, the role of unified communications and collaboration (UCC) is rising in importance for most organization. Executive...
April 24, 2014
Small centers are both excited and nervous about the new wave of innovation converging on them as major disruptive forces - cloud, mobile, big data and social - rock the contact center world. They rea...
April 9, 2014
Recent advances in cloud technology have given rise to wide variety of new tools designed to support contact center performance, staffing and reporting. Join us for a live webinar focused on helping c...

Sign up to the No Jitter email newsletters

  • Catch up with the blogs, features and columns from No Jitter, the online community for the IP communications industry. Each Thursday, we'll send you a synopsis of the high-impact articles, podcasts and other material posted to No Jitter that week, with links for quick access.

  • A quick hit of original analysis by the experts who bring you Enterprise Connect, the leading event in Enterprise Communications & Collaboration. Each Wednesday, this enewsletter delivers to your email box a thought-provoking, objective take on the latest news and trends in the industry.

Your email address is required for membership. For details about the user information, please read the UBM Privacy Statement

As an added benefit, would you like to receive relevant 3rd party offers about new products/services and discounted offers via email? Yes

* = Required Field

No longer instrested? Unsubscribe here.