No Jitter is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

VoIPmageddon: Is Quality Leading to a Telephony Meltdown?

All of the talk about Snowmageddon 2015 has me thinking about the Armageddon we could be heading toward in enterprise communications over degrading voice quality in the post-VoIP conversion world

VoIP-driven quality issues, be those related to unacceptable latency, line noise, garbled speech, or dropped calls, have popped into recent conversations I've had with cloud telephony and conferencing providers, a few enterprises, and a number of other industry participants. And although my sampling may be unscientific, to me it seems to show a significant increase not only in the number of issues but also the percentage of calls having those issues.


So I've been thinking: Is VoIP adoption beginning to create a larger percentage of low-quality/dropped calls? Are we approaching VoIPmageddon?

While most of the VoIP community sees VoIP as a basic service, the reality is delivering quality voice over an IP-based implementation is challenging and can be fraught with issues. Many of the quality issues that arose during VoIP's 10-year boom (2000 to 2010) were resolved in the implementations of the period. That's led to the common thinking that using some form of quality-of-service (QoS) mechanism solves all problems. However, many other factors can cause voice quality issues.

Are You Still There?
A good understanding of VoIP quality starts with knowing how various factors effect voice calls. The first is latency, which impacts how we interact.

When we talk over distance, we pause to let the other person speak. If latency is too high, then the first speaker starts talking again before the second speaker's voice gets back. This results in a "collision," which feels like an interruption (for more information on this process, please read the PKE Consulting white paper, Making IP Networks Voice Enabled). In a traditional voice network, the TDM process assures low latency within the local loop as well as across long distances.

In the TDM infrastructure, the voice stream is sampled once every 125 microseconds and then each 125 microseconds the sample is moved to the next stage in the path. This results in very low latency. In the local loop, we typically see round-trip latency of less than 10 to 15 milliseconds (msec) on a TDM voice call. Long-haul latency also accounts for sampling (a few samples at 125 microseconds) plus fiber propagation and echo cancellation. So, the round-trip latency for a cross-country TDM call in the U.S. has four components: 30 msec maximum for the local loop (the local loop times two crossings), maybe another 5 msec for sampling the forward timing in the long-distance path, a couple of milliseconds for echo cancellation there and back, and propagation delay of about 16 msec in each direction (with a propagation speed in fiber of 85% of the speed of light and a 2,500-mile distance). In sum, total round-trip cross-country latency is about 84 msec.

Considering the same factors, placing a TDM voice call halfway around the world -- 12,500 miles -- would result in latency of about 220 msec. This is well below the latency threshold of 275 to 300 msec we accept in natural conversation. Even adding in a conferencing bridge delay of an additional 5 msec in each direction results in acceptable overall delay.

In VoIP, delay is typically much higher because of the way the voice stream gets broken into packets. In a typical VoIP connection, at least three packets flow in each direction, resulting in at least 60 msec but more likely 80 msec or more of packet delay in each direction. In addition, protocol delays, router forwarding, codec time, and other network processes effect latency.

So, on a LAN with essentially zero propagation delay, the typical round-trip latency for an end-to-end VoIP call is around 130 to 170 msec (see the white paper for more details). Consider our 2,500-mile, cross-country scenario, and total round-trip latency increases to 160 to 200 msec due to the transmission time. For our halfway around the world call, due to the transmission time increase, the total is now 270 to 300 msec, right at the edge of our ability to perceive it in conversation. Of course, latency will increase even more if the IP path is indirect, looping back based on peering locations.

Now let's look at what happens when the call is not VoIP end to end but instead crosses the TDM-based PSTN as it moves from one VoIP network to another. This happens when two enterprises have deployed VoIP, but one or the other (or both) is using a TDM trunk to the PSTN. In this case, packetizing the voice for each VoIP segment, there and back, accounts for additional latency. The result is that the round-trip latency, even when the PSTN connection is local loop and both PBXes are on high-speed LANs, increases to 250 to 320 msec. As the distance increases, this number grows. For the 2,500-mile call (30 msec additional round trip plus 5 msec for router hops), the latency is now over the 275 to 300 msec threshold of perceived latency. Likewise in our international VoIP calling scenario, in which latency would reach well over 300 msec.

Conferencing in the VoIP domain has the same sort of results. When we join two IP networks by the PSTN or a conference bridge, even in the best cases, we approach the threshold of perception. If we add an additional packet into each of the jitter buffers, the resulting 80 msec (one 20-msec sample in each of the four IP domains round trip) takes the latency up closer to 400 msec, which makes natural interactive speech very challenging and will significantly degrade a provider's mean opinion score, or MOS.

This is all to say that VoIP calling works well when the communications is IP end to end or, if not end-to-end VoIP, then when only one end uses the PSTN. The potential for perceptible latency goes way up when the PSTN sits between two VoIP networks or during a conference call. While a conference bridge can mitigate this somewhat with jitter buffer management and error correction, in the PSTN/TDM case, the operation is defined by the requirement to minimize underruns with larger jitter buffers and the need for gateway error corrections. Using SIP trunking between two IP domains can eliminate the TDM-added latency, of course, but SIP trunking is still a relatively low percentage of all trunks.

How this latency plays out for an enterprise can lead to challenges. When using a conference bridge with multiple VoIP domains and paths that often loop back on each other, for example, two speakers may be 400 to 500 msec apart. This will result in a relatively poor and inconsistent experience for participants, with latency varying based on the type of network they've used to dial in to the bridge.

Click to the next page to read about other factors affecting VoIP call quality

Did Somebody Say Something?
Echo and noise are additional factors affecting call quality. As we all know, echo can happen for a number of reasons, including when an endpoint device has inadequate echo control. This is starting to crop up more and more as the number of PCs and low-cost Bluetooth speakers increases. One challenge with endpoint echo is that the user who has the device creating the echo does not hear the echo while everybody else on the call does. Another potential issue is the difficulty of echo cancellation in the VoIP domain.

In TDM environments, echo cancellation is relatively easy, handled by echo cancellers that sit on the end of each line (coming out of the long-distance path). The time domain is essentially fixed so echo is easily removable. Echo cancellation in a VoIP environment is much more complicated because the echo cancellers can be applied at multiple points and not just at the end of a line. Latency will vary based on where the echo cancellation takes place and the time of the packet, which is often variable. Noise is similar in that it can come from multiple places in a complex path and is very challenging to remove. In conferences (meshed peers or server based)noise bursts sometimes get injected into the call because noise is mistaken for active speech.

The result is that quality issues often come and go on VoIP calls, seemingly at random, but often have these various complex interactions at their root. These are hard to troubleshoot as the sources will change and may be dependent on the configuration of a certain call, the IP path, the speaker's environment, the speakers' devices, and so on -- and even vary between listeners on a single call.The transitions between IP networks can introduce issues, as well. On many calls today, it's not uncommon to see three, four, five, or more carrier SIP peering transitions. Depending on how the carrier has implemented its codecs and echo cancellers, these transitions can cause significant degradation.

As an example, let's look at a call between Enterprise 1 and Enterprise 2 wherein Enterprise 1 connects to Service Provider A, and Enterprise 2 to Service Provider Z. Service Provider A uses Interexchange Carrier B while Service Provider Z uses Interexchange Carrier X. Interexchange Carrier X peers to AT&T and Interexchange Carrier B to Verizon. We now have this sequence between the enterprises: Enterprise 1 to Service Provider A to Interexchange Carrier B to Verizon to AT&T to Interexchange Carrier X to Service Provider Z to Enterprise 2 -- a path with seven interfaces.

If two or three of those interfaces have codecs and are transcoding speech, sometimes multiple times, voice quality can significantly degrade (remember what happened to quality after copying a videotape for a third or fourth time). And, depending on peering locations, the transit times can explode as the peering points are often not in a direct path but cause significant loops, increasing latency.

Last, but not necessarily least, more VoIP calls are using the open Internet, which through latency, paths, dropped packets, and jitter can further exacerbate all of the issues above.

As I've said, I have seen indications that degradation can be perceived on a growing percentage ofVoIP-to-VoIP calls. The data is not absolute, but I believe it shows that quality issues are impacting between 1 to 10% of paired calls. For purposes of this analysis, I am going to use 5% as the "poor" factor. In other words, of 100 random VoIP-to-VoIP calls one which potentially all of the above issues crop up, five on average will have perceivable quality issues. Based on a number of discussions with vendors, end users, and others, I think this is a reasonable, even conservative measure. In fact, it really may be as high as 10%.

Click to the next page for a by-the-numbers assessment

What's the Big Deal?
If you have read to this point, your reaction is probably that this all sounds bad, but aren't sure why it deserves your attention now since VoIP has been around for years. Or, perhaps you're thinking, "I have been using Skype and it works well for me -- so what's the big deal?"

The reason this deserves attention now is that the number of VoIP endpoints is reaching critical mass in the installed base, especially for business telephone systems. The enterprise telephony installed base churns at about 5% per year. It has been churning to VoIP for the last 10 years, with increasing velocity. This means that we are now approaching 40 to 50% of the enterprise endpoints using VoIP. However, TDM trunks still connect the majority of those business VoIP systems. Figure 1 below shows what this means in terms of TDM-to-TDM, TDM-to-VoIP, and VoIP-to-VoIP calls, as well as provides a look at the percentage of all calls having VoIP-attributable call quality issues (with a 5% call quality impact estimate).


Figure 1. Source: PKE Consulting

The real trend is even more evident by charting the results, as shown in Figure 2 below.


Figure 2. Source: PKE Consulting

The infographic shows that we are entering a period where the percentage of VoIP-to-VoIP calls is increasing rapidly. If enterprises are adopting VoIP endpoints at a rate of about 5% per year, and we are currently at 50% adoption, then 25% of all business-to-business calls are VoIP to VoIP -- in terms of where the 5% issue occurs. This means about 0.6% of all calls (including TDM) are having voice quality issues because of VoIP. But in three years when VoIP adoption is closer to 70%, almost 50% of calls will be VoIP to VoIP, resulting in 1.2% of all calls having unacceptable quality. So in two or three years, we can expect a doubling of bad calls. And, as many consumers are moving to VoIP, either with a VoIP provider or because their analog is being transitioned to voice in the network by the carrier, the same data applies to business-to-consumer calls and even consumer-to-consumer calls. The key question is whether a migration to 5% of all calls having real quality issues will be acceptable as it will be when we reach 100% VoIP endpoint adoption with the current issues.

The problem is significantly accelerated in conferencing due to multiple participants. In a conference call with five participants, the probability of having at least two participants on VoIP is now 100%. The reason is that in a five-party conference there are 20 point-to-point paths (four for each participant). So, if 10% of endpoints are VoIP, there is (20 x (.1 x.1)) or a 20% probability of at least one VoIP-to-VoIP path (this is averaged, not for single conference). So, with more than 30% VoIP endpoints in the North American installed base, we have probably already reached the point where virtually all conferences have at least two VoIP endpoints for conferences with five or more parties (see Figure 3, for a look at the percentages of conferences with at least two VoIP participants for three-, four-, and five-party conferences based on VoIP endpoint adoption).


Figure 3. Source: PKE Consulting

Conferencing has probably already reached the point where many conferences have multiple VoIP endpoints participating. The question is the probability of issues. If we assume that 5% of VoIP-to-VoIP connections have issues, Figure 4 below shows the percentage of conferences that will have issues due to the VoIP pairings and that 5% VoIP-to-VoIP perceived quality level.


Figure 4. Source: PKE Consulting

As you can see, for a five-party conference, we are approaching the point where 10% of all conferences will have perceived voice quality issues. Again, I've discerned this to be an issue through my conversations with conferencing providers. The challenge is that at 100% VoIP endpoint adoption, 25% of five-party conference calls will have quality issues!

After going through this analysis, I have concluded that unless we can dramatically eliminate the causes of VoIP quality degradation we are going to have an explosion of bad calls. As VoIP endpoint adoption continues in businesses and the consumer areas, quality overall will suffer and increase at an even higher rate than the adoption.

What Are We To Do?
To mitigate these issues we need to make fixes at both the individual company and industry level. Overall, the industry could address the latency issue by universally reducing the packet voice sample size from the current 20 msec "standard" to 10 msec or even 5 msec. However, since the IP header size is fixed reducing the sample size will generate two packets with twice as much overhead. That means going to 5 msec will quadruple the overhead, increasing the overall bandwidth used by a multiple.

For an enterprise or other VoIP-adopting organization, eliminating TDM trunks and choosing service providers based on minimal peering paths will reduce the impacts on quality. In fact, SIP trunk providers should begin positioning their peering quality in latency and number of hops on average. Picking a conferencing platform that is designed and deployed to reduce latency, noise, and other quality factors will make conferencing more effective. Some conference vendors are actively managing these issues. An alternative is to use a cloud VoIP service in lieu of a premises-based offer, though this may introduce last-mile IP issues.

For individual service providers, choosing technologies and deployments to reduce latency, transcoding, and other factors is critical, as is enabling TDM trunk contracts to be migrated to SIP without penalties for customers who have deployed IP PBXes. This will accelerate the transition to SIP trunking. Service providers as a group need to: Choose a single end-to-end encoding standard to eliminate all transcoding; do echo cancellation on the incoming streams at all SIP trunk points instead of at the outgoing endpoint; and work as a group to minimize the latency impact of peering locations and paths.

In the end, the best way to have good VoIP quality is to eliminate all of the intermediate points that cause the issues. Point-to-point networks like Skype have proven that acceptable quality can be achieved between peer-connected endpoints with an all-IP path. Open peer-to-peer protocols like WebRTC can have an impact and improve the overall operation of the system by minimizing transitions, using standard codecs, and minimizing other factors.

I believe it is time that the entire VoIP industry started a dialog about where we are and where we are going. We need to have clear statistics on the current level of unacceptable call quality that we can, in turn, use to project future challenges. We need to work together to define the standards discussed above. If we do not, I fear that VoIPmaggedon is coming, and it will impact us all as the users will blame their vendors, their carriers, their consultants, everybody. Comments?

Join Phil Edholm at Enterprise Connect Orlando in EC Summit: Managing in a Software-Intensive World. Save $300 on conference passes when you register today with discount code: NJSPEAKER.