How do you know that your UC systems are operating smoothly and that there are no problems? If you're simply relying on calls and complaints from the user community, you're not being very proactive. The complexity of the UC system, plus the network on which it runs, is more than any single system is designed to handle. Therefore, to get the best coverage of the entire system, you will need more than one tool.
If you're heading to Enterprise Connect this year, you'll find me leading a panel to talk about network test tools. I'll have several vendors there who will provide technical descriptions of what their tools do. You can also find these vendors in the exhibit hall, where they can provide you with more in-depth descriptions of what their products do and how they can help you find, isolate, and correct problems that affect your UC systems.
Monitor The Network
The first tool for UC management is a solid network management architecture. Since UC uses the network as its transport, proactively finding and correcting network problems is a great first step. You'll find that you typically need a set of tools, or multiple modules from some of the larger network management infrastructure vendors. I've written about the network management architecture that has worked well for me in numerous customer engagements in A Network Management Architecture, Part 1, also shown in Figure 1 below. A good network management system should reduce the amount of effort that is required to run the network, allowing you to spend time doing upgrade designs and looking after things like the UC systems.
Figure 1. Network Management Architecture
Active Path Testing
Next up is Active Path Testing, which I have found to be useful for both data and UC. A good active path testing system uses inexpensive probes that are automatically maintained with current versions of software, such as those from AppNeta. An alternative to individual probes is to use the installed network equipment as the probes, as is used in Cisco's IP SLA function.
The tests for UC would include jitter, latency, packet loss, and out-of-order packets. It is also useful to have a record of the Layer 3 hops that the packets traverse between the test source and destination, with alerting when a significant path change occurs.
Manually maintaining the configuration of test paths is a hurdle to a good deployment. Don't fall into the trap of manually configuring a complex set of test paths and then having to maintain it forever. The better systems will make the process of configuring the active test paths easy. Automated configuration systems are extremely useful for reducing the effort that is required to setup and maintain the tests. If you decide that you want to change a parameter for a certain type of test, you want the automation system to make that change across all the tests of that type.
Finally, you need a good reporting system. My philosophy for network management is to configure the systems to report anomalies and to otherwise be silent. I don't need to know about all the paths that are working correctly. There are simply too many of these in a network to individually track each one. A list of paths that exhibit problems, nicely sorted by severity of the problem, provides a "punch list" to direct the daily work of the network and UC administrators.
Next Page: Packet capture
Packet Capture
Sometimes, there is no good substitute for the analysis that is possible with a packet capture. Several tools exist for capturing packets in transit across the network, from companies like NetScout, Opnet, and WildPackets. These tools often store the captured packets for a period of time, depending on the amount of storage allocated and the bandwidth of the monitored links. The stored data allows you to perform detailed traffic flow analysis to determine what happened. Expert analysis modes in many tools make the analysis easier to perform than simply displaying decoded packets.
In the UC environment, analysis of the sessions can identify packet loss, high jitter, and out-of-order packet flows that are caused by per-packet load balancing mechanisms. The out-of-order packet flows can be difficult or impossible to diagnose with less capable tools.
Of course, you need the data collection system located at the right points in the network in order to capture the packets you need to see. In our consulting engagements, we look for network choke points, where most of the traffic is aggregated onto a few links. Placing a large-capacity packet capture and analysis system at these choke points allows us to monitor, troubleshoot, and to provide alerts for both UC and data traffic flows.
An advantage of this type of tool is the ability to analyze and report on signaling problems, which are TCP-based transactions between the UC endpoint and the UC controller. Packet loss on control paths affects the time to provide dial tone or the time it takes to place a call. Particularly bad cases of packet loss may result in session setup failures, which looks to the end user like a network outage.
UC System Analysis
The network is sometimes not the source of a UC problem, which brings us to the last major tool. The UC analysis tool monitors the UC system, including the controller, endpoints, and other components such as border controllers.
In a VM world, it is possible that the UC controller is running on a VM instance that is competing with other heavy-load VM instances for hardware resources. A non-VM server may have hardware problems, a process with a memory leak, disk I/O problems, or a NIC that has high errors. A UC vendor's system may be advertised as multi-core capable, but it may have one process that's not been converted and it happens to be a critical bottleneck in your implementation. And of course, many UC systems depend on external services such as LDAP, which may itself be a source of slow operation or failure. Will your UC system continue to operate correctly if it can't communicate with an LDAP server?
A particular advantage of UC System Analysis tools is that these systems also collect and report statistics that are collected by the voice and video endpoints. It is reporting on the quality and performance of all the voice and video calls. No additional network traffic is generated and no expensive network taps are needed. Using the call details, it is easy to identify a phone or video system that consistently experiences high packet loss.
Similarly, a call that shows high packet loss or high jitter only between two specific endpoints would be an indication of a problem in the path between the two endpoints. This tool may not tell you the exact location of the problem, but it alerts you to the existence of the problem. You can then use packet capture tools or active path analysis tools to identify the path and identify the source of the problem.
As with the above tools, I prefer systems that are easy to configure, with good out-of-the-box functionality. Reporting should quickly identify actionable problems to be corrected. Summary reports must also be available that allow trends to be spotted before they become problems. Of course, managers like to have summary reports, so make sure the reporting system provides a summary of all identified problems, ranked by severity.
Summary
One tool is not sufficient to proactively monitor and mange UC systems. For the best staff efficiency, identify tools that don't require a lot of effort to get running and to maintain. The tools should produce actionable reports that identify real problems that should be corrected. Summary reports should be available that show trends and patterns that allow you to identify problems before they begin to affect your UC systems.
Don't forget about staff training for effective use of the tools. I've found that any organization needs at least two people trained on every important tool. This allows one person to go on vacation or sick leave without risk to the organization's UC system.
Join us at Enterprise Connect to learn more about the tools that can help make sure that your UC systems are running smoothly.