No Jitter is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Voice as Spice

One additional note about the conversations around WebRTC at the Illinois Institute of Technology conference this week. I heard a nice phrase and idea about what WebRTC and voice-enabling of browsers could do to the whole idea of voice as a part of our overall communications. It came in a conversation I had with Anant Narayanan of Mozilla Labs, who credited fellow WebRTC guru Tim Panton of Voxeo Labs with originating the phrase: "Voice as spice."

What he meant is that once voice is simply there in every browser and thus potentially in every (or at least any) experience that a user will have on the Web, communications via voice no longer has to be intentional, something you dedicate some discrete effort to; instead it can crop up anywhere, anytime you're using the Web.

I remember talking to Phil Edholm, back when Phil was the CTO at Nortel Enterprise, about a related idea. (Phil of course is now one of the enterprise world's leading WebRTC gurus out there and has written for us on the subject.) Phil talked about how, in the then-emerging world of VOIP, what he'd really like to see is for voice connections to be nailed-up and persistent, so that you have, in essence, an intercom system that doesn't just reach out to your admin's desktop, but instead can reach out across the Internet to anybody you need to maintain that sort of connection to.

That, in essence, seems to me to be one of the things that WebRTC could do, if you wanted.

Clearly, this vision of WebRTC and "voice as spice" has implications for the underlying network(s): How would this change the mix of traffic on the Web between real-time, near-real-time and non-real-time, and what would the implications be for prioritization of different classes of real-time traffic?

We've always put all voice traffic into the top tier when it comes to quality of service, but can the Internet support that for all the voice traffic that it could be asked to carry? And could some voice traffic be given a lower priority--for example, could the aforementioned "intercom" voice be considered a kind of "near-real-time" class of traffic? Obviously you'd want the voice quality as played out at either end to be acceptable, but maybe delaying that message by a second--a long time in packet terms--wouldn't be so bad. It'd be more like a spoken IM.

Or maybe you wouldn't even need to worry about the ability of the Internet to provide this high-quality/moderate-delay delivery--you'd just allow for bigger jitter buffers on either end--of a size that would be unacceptable in true conversational voice, but would be OK for staccato messaging?

One other thing that Anant mentioned is that WebRTC architects hope to support not just video, but also data traffic in addition to voice. That, theoretically, could present a whole new challenge to the desktop-sharing Web conferencing model.

In any event, Anant was definitely right when he said that you just can't predict what's going to emerge once WebRTC gets released: "The thing that's going to blow us away is something we can't anticipate" today.