It got me to wondering why, with so much content produced inside and outside the corporate walls in conversations, presentations, speeches and the like, we haven't seen more commercial implementations of innovative transcription solutions. In anticipation of the comments and emails highlighting the great work already being done in this area, I will point out that there have definitely been some very interesting innovations in this area, but, in my mind, not nearly enough.
I'll speak to the question of transcription in three blog posts. This, the first, addresses why transcription even matters in the corporate context. The second and third posts will outline the state of transcription today, some areas where I believe we need to take the technology and how it can be meaningfully applied to UC.
On why transcription matters...certainly there's no shortage of valuable corporate audio content. I don't think I need to expand on this point too much; simply consider what would happen if every call, presentation and video conference could be easily accessed after the fact with appropriate security and retention policies applied to the recorded media files (think Enterprise DVR).
The case for reading the transcripts of this content vs. simply listening to it seems to be pretty straightforward: consider how much more effectively we'd collaborate if we could consume this recorded audio and video content nearly twice as fast, understand it better and find it more easily (or at all).
Speed
The math is simple--we read nearly twice as fast as we listen, assuming we want to read every word (an author's dream but not necessarily reality).
The average American adult reads prose at 275 to 350 words per minute while most of us speak at around 150-165 wpm. Slide presentations and speeches are often at 100-120 words per minute. Auctioneers and most true Bostonians speak at 200-250 wpm.
Exhibit A: consider the unscientific example of the book-on-tape. It recently took me about 20 hours to read the Fountainhead vs. the 32 hours I'd have needed to spend listening to the complete audio-book. As an aside, if you're interested in the ultra-nerdy video book report I created (yes, with a Flip camera), email me.
While recorded audio can, of course, be time-compressed, listeners start to fail (as measured by comprehension and retention) at around 250 wpm. Similarly, reading rates can be improved through various techniques such as Rapid Serial Visual Presentation.*
Naturally, most of us will also skim and skip ahead when reading, a process that's much more difficult with video and audio content (when was the last time you used the well-intended five-second-advance feature in your voicemail system?). This offers even more opportunities to increase consumption speed with text vs. recorded audio.
Recall
While there doesn't appear to be conclusive evidence in the academic literature, most researchers seem to suggest that reading produces higher comprehension and recall rates than listening. I propose two simple ideas to support this hypothesis: re-reading and fewer disruptions.
With text, we frequently slow down or go back and re-read text that's complex or that we might have missed due to a distraction. With audio streams, the pace for the listener is set by the speaker.
While not impossible, reviewing, slowing down and skipping ahead are more time consuming with audio and video streams and require the use of a pointing device (finger on pointing device, touch screen, remote control or telephone keypad) with most media playing applications.
I would also argue, though somewhat less convincingly, that it's easier to get distracted when watching a video or listening to an audio stream. This is based on my own experience; I find that, despite my often vehement assertions to the contrary, I sometimes toggle over to another application (most often email) in the middle of a video or--gasp--a slow conference call.
I wouldn't do this when reading a blog or article. If I did switch away from the article, however, I would return to the same spot from which I'd departed (or maybe a bit earlier), not try to convince myself that I'd actually continued reading during my break (as is often the case when we switch away from recorded or live audio and attempt to background-process).
Why do we multi-task more when listening vs. reading? We feel inefficient. Because we're consuming content more slowly, we delude ourselves into thinking that we can take on more without impacting our understanding of the primary track. Although few admit it and many are trying to stop it, many of us multi-task this way (consider how taxing this is on conference call efficiency).
In addition to being faster, reading edges out listening for retention and recall due to the ability to quickly and easily review content missed and the higher concentration given to reading vs. listening in real world use cases.
Access
There are other benefits to text-based representations of audio content. Most notably, the content can be indexed, searched and easily linked-to. While this is technically possible with audio and video recordings as well (think key-word spotting), text based search and folksonomies based on linking and tagging are becoming nearly ubiquitous in the modern enterprise.
In fact, my web searches in support of this post led me to this 1984 TED Talk by futurist and MIT Media Lab co-founder Nicholas Negraponte where, with a brilliantly recursive or perhaps Darwinian twist, he anticipates the interactive transcript feature that allowed me to discover his speech. To find the pertinent segment of his talk (assuming you're time constrained), simply click on the red "Open interactive transcript" link on the right side of the page and search for and click on the words "text-synch." As it did for me, I'd guess that this experience will instantly illustrate for you why the interactive transcription is necessary in the enterprise.
Analysis
Just as we're seeing a burgeoning world of real-time search and analytics on the public Internet (e.g., all things Twitter), timely access to transcripts of presentations and conversations (bearing certain challenges in mind, of course) can also help define and associate conversations going on throughout the organization. One obvious example is in the contact center, where keyword spotting applications promise to identify the zeitgeist, helping management better identify challenges (scope of service outages) and opportunities (slip-ups by competitors).
I'll expand more on this notion in the third part of this post.
* * * * * *
In my mind, the case for having transcripts for recorded conversations, meetings (audio and video), presentations and speeches is clear. In the next installment of this post, I'll address some key challenges with realizing this vision and some capabilities that are available today. In the third part, I'll outline some areas of innovation that I believe would directly benefit users of UC.
If you are aware of examples of the effective use of transcription technologies today or ideas of where we should be pushing, please email me or comment on this post. I'm sure there's lots of creative work and innovative use cases that I haven't yet come across or considered.
* As an aside, I'm determined to increase my reading rate so I can consume more in the same amount of time (as my mother helpfully points out, "If you buy things on sale, you get more stuff for the same amount of money"). I've been been constructing a few experiments to measure the results of various new methods I'm trying and will report on this separately.Consider how much more effectively we'd collaborate if we could consume this recorded audio and video content nearly twice as fast, understand it better and find it more easily (or at all).
|
logon-to-comment
|
Quick View | Full View | 2 Comments |
Great points
Comment by ANON1241625196739 May 24, 2010, 09:31 AM EDT
Mike: This was a really interesting post. Really glad to have you back.
Running a website, we struggle with the question of what kinds of media people want, and it does always seem that things like podcasts and videos dont draw as well as we tend to expect they will. I think we assume that whatever the newest medium is, will be the one that people will preferand text is older than audio is older than video. But TV news never did put newspapers out of business, even though its a newer medium; whats putting newspapers out of business is the Internet, which is still largely text-based when it comes to news. TV did cut into newspapers, and now the Internet is killing them, because of immediacy, not medium: You can only get one edition of the newspaper a day, at the same time every day, which fatally limits what the newspaper can report, and therefore how important it can be to you. When once a day (or twice, for some papers) was more often than people were used to, newspapers were a vital medium. Now once a day seems hopelessly sluggish.
So why did you do a video book report instead of a written one?
Thanks again. --Eric |
Dead on........
Comment by D.M.Jones May 23, 2010, 15:53 PM EDT
I think the value of transcription is immense if you look at what Google is doing with the grand central acquisition in addition to all of the you tube efforts they have proven that there is a value in natural language interpretation of the spoken word. The first time it hit me was when a friend of mine left me a message about an upcoming sporting event and I began to see adds related to the sport and the actual event returned in the paid search area of my Google searches. The tipping point will be when enterprises or more importantly the technology partners that develop these capabilities provide intuitive applications with clear use cases and direct benefits. Great to see a well written non product centric article it is just what I needed to read going into the short work week....Bravo Mr. Bergelson I look forward to the remaining installments. No Jitter....more articles like this less vendor splat sheets. |
This is a public forum. UBM TechWeb and its affiliates are not responsible for and do not control what is posted herein. UBM TechWeb makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.
Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of UBM TechWeb and may be edited and republished in print or electronic format as outlined in UBM TechWeb's Terms of Service.
Important Note: This comment area is NOT intended for commercial messages or solicitations of business.
Focus: Networking IP Communications
Featured Article: Under the Hood of Microsoft Communications Server 14
· CONTACT CENTERS Sponsored by
Webinar Replay: Contact Centers & Migration to IP
Recent blog: When Preparation Meets Opportunity in the World of Customer Care

No Jitter for quick reference. It's easy and free! CLICK HERE to register and get started!

