This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.
Network Visibility: Using Active Path Testing
You've updated your voice calling system. It uses SIP trunking, you've saved money, and everyone is happy. But a few nagging problems let you know it's not working smoothly all the time. How do you tell whether the network is at fault?
By now, you should know the factors that affect voice quality: latency, jitter, and packet loss. Except in some cases like satellite circuits, latency should be low, perhaps 100 to 200 milliseconds at most. Jitter should likewise be low, unless congestion is driving big changes in queue depths at multiple points in the path. Packet loss may be a significant factor as congestion fills buffers in the networking equipment along the path.
How do you know the network is running smoothly? How do you know a voice problem is external to the parts of the network path you administer? You can't answer these questions if you don't have good network monitoring instrumentation.
To start, the network monitoring system should be monitoring all interfaces in the paths over which voice may travel. This means the network monitoring system needs to be inexpensive and correctly configured to monitor all interfaces. I've seen too many implementations in which cost has limited the use of the monitoring system such that some network interfaces go unmonitored. The result is a lack of visibility into potential causes of packet loss.
I like to instrument a network to record interface errors and drops. Errors are a network interface or media problem, like a dirty optical connection or a noisy WAN circuit. Drops occur when a network interface's buffers fill and another packet needs to transit that interface. Congestion on egress is significantly more common than congestion on ingress. Investigate interfaces that have more than 0.0001% (that's 1x10E-5) packet loss, and fix those with errors. Drops require other measures, which I'll cover below.
I also like to get reports on call data records and call maintenance records from the call controller, and look for calls that have low mean opinion scores (MOS). A calculated MOS uses latency, jitter, and packet loss to arrive mathematically at a score for the call, so a breakdown of each factor helps identify the type of problem that needs investigation.
Finally, I've found tracking call paths useful. I have firsthand examples where figuring out the call paths for poor video quality took a long time, as I discussed in my initial No Jitter post, "Know The Path Your Media Sessions Take." As we worked on figuring out what was causing performance issues in the two examples I'd shared in that post, being able to drop a test system into the two video facilities to measure the real network paths would have been nice. It would have directed us to make sure that the testing went via the same media concentrators as the video paths, allowing us to quickly identify the core problems.
Networks are too big to look at individual elements. I like to use the network management product's reporting system to sort the test results in Top-N reports that identify the worst offenders. I can then focus on the worst problems. I also look at the Top-N report to determine commonalities. Is common infrastructure involved? Does it always involve a certain site? Does a Top-N problem call report correlate with an interface in a Top-N drops report? Using the network management reporting tools to identify problem spots is very efficient.
Network Controls for Voice Quality
We have several ways to help assure voice quality on the network. Quality of service (QoS) is the most common mechanism. I like to create QoS bandwidth definitions by identifying the number of concurrent calls, then convert that to a bandwidth figure to use for QoS configuration purposes. The problem with this approach is when the requirements change after the initial configuration, perhaps months or years later. The QoS bandwidth reservation may not be updated when the requirements change. It is best if the facilities staff understands that changes in facilities or staffing will drive changes in network bandwidth allocation and to communicate any potential staffing changes to the network and UC teams.
The second control we have is call admission control (CAC), for those times when call volume is heavier than the allocated bandwidth can handle. It is important to have the call controller create an alert when this occurs. Even better, use the call controller to monitor the number of concurrent calls and track trends. You'll want to look for reductions and increases in call volume that indicate changes in the business requirements over time.
Verifying Quality of Experience
I am a proponent of active path testing, which provides end-to-end verification of the quality of experience (QoE). The best form of active path testing will include a measurement of the performance of the call setup mechanism, which involves the call controller and all its internal steps.
ThousandEyes is the latest vendor to measure UC system performance by creating real calls and reporting variances from accepted norms. I recently attended an online demonstration of its product, and it does a nice job of measuring QoE. You can install the ThousandEyes software-based testing endpoints on a variety of VM hosts, including Docker, ESXI, HyperV, KVM, and VirtualBox. Or, you can deploy the system on an Intel NUC if you don't have an existing supported platform.
ThousandEyes is targeting the SIP-based UCaaS customer. The offering shows the path the media traffic is taking, with information that highlights problems along the path. With agents deployed around the Internet, ThousandEyes allows testing from an organization's sites to various Internet locations. Why is this important? Well, you can then configure tests across branches and from various points in the Internet, allowing you to identify problems within a branch as well as problems in the Internet that are not directly in your control.
ThousandEyes also tests border gateway protocol routing and Domain Name System, two common but sometimes overlooked sources of problems that affect voice calls.
The product comes with a few caveats, however. It doesn't work with Microsoft Skype for Business, which has authentication requirements and data channel encryption that the company has not yet addressed. And it doesn't support the Cisco Skinny client control protocol. That said, you can still use the agents to run basic network connectivity tests using real-time protocol to Internet test sites, so functionality is still significant.
The test agent software needs to run on devices that are in relatively fixed locations, because part of the setup is configuring the IP addresses of the agents. This means that the agents are not suitable for use on laptop endpoints. ThousandEyes tells me it has no problems working with session border controllers or SIP proxies.
My summary is that you now have another tool available for monitoring a UCaaS voice system and providing a more complete view of QoE.