Testing VoIP on SD-WANs

Happiness is expectations minus reality. So, what should one expect when testing VoIP on a software-define WAN solution? The expectation in the branch is to get even better performance and security than MPLS, while also avoiding the high cost or complexity of MPLS. Since VoIP is the most sensitive application on most IP networks, getting it to work well will ensure the SD-WAN solution will be successful.

Here are some things to test with VoIP on an SD-WAN solution:

  1. Failover -- Using multiple links to ensure application performance, if one link fails or has a brown out where dropped packets or jitter goes above a certain threshold, the voice conversation should fail over to a better path in less than 2 seconds. This requires real-time link monitoring on a sub-second basis. Traditional IP routing protocols take 5-30 seconds, and in most cases only fail over if the link is down, not if UDP packets are being dropped. Failing back to the original path once healthy should also be tested. Especially test this with LTE, since most WAN outages are in the last mile.

  2. Load Testing -- If a branch office needs to be able to support 20 concurrent calls, for example, this would traditionally take 1,200Kbps, assuming a G.729 codec. In an SD-WAN world, depending on SD-WAN vendors overhead, this would take 2,720Kbps. Yes, SD-WAN overhead is significant! DSL, cable, and LTE provide asymmetric bandwidth, meaning download speeds are significantly higher than upload speeds. While this works well for Web apps and downloads, voice is symmetric and requires this amount of bandwidth bi-directionally (see related No Jitter article).

    portable

    Figure 1. Example of SD-WAN Overhead

  3. Peer-to-Peer -- When calling from an office across town to the warehouse, the VoIP packets should not have to hairpin through a data center or remote cloud provider. Many early SD-WAN deployments are hub-and-spoke based, which adds latency. With large implementations, peer-to-peer has trouble scaling and it's like going back to the frame relay days. We moved to MPLS because of the peer-to-peer capability, and we do not want to move backwards.
  4. MOS Alerting & Reporting -- Measuring packet loss and jitter in BOTH directions and creating real-time alerts and longer-term SLA reports is critical. As a network manager, when call quality if poor, you are guilty until proven innocent. So having the data and being able to correlate it to specific calls is critical.

What Not to Test

SD-WAN FEC is something that should not be tested. In theory, FEC involves duplicating the voice packets and sending them across two different paths, re-assembling them at the far end. Standard Web apps can use the TCP process to resend packets (unless international or satellite where latencies are really long). Real-time apps that use UDP such as voice can benefit from FEC, but the theory is better than reality.

In reality, the different paths have different latencies, so significant buffering must be done, plus the significant overhead as mentioned above. In networks, packet loss comes most often in large bursts (brown outs) and it's better to fail over fast than to duplicate packets. Duplicating packets and sending them over LTE is cost prohibitive.

Funny thing is that I use to be a fan of SD-WAN FEC. But after talking to enough people who have done real-world implementations, and since link monitoring and quick failover by some of the vendors has really improved, I'm no longer a fan.

As a 30-year network veteran, I'm shocked that most people do not understand how much overhead the SD-WAN overlay adds and all the complexity IPsec tunnels create. For most of my career, bandwidth costs were at a premium, and this is why low-bandwidth codecs were so common. The overhead for going to IPv6 and adding IPv6 tunnels gets even crazier.

Related content: