UC&C Troubleshooting Tools
There are a variety of good UC&C troubleshooting tools that can aid in diagnosing troublesome problems.
Diagnosing UC&C problems requires a multi-step process. A mechanism is needed to detect that a problem exists. Additional tools may be needed to identify what subsystem has failed because one set of symptoms may be caused by several different failures. Finally, it is useful to understand which applications are affected so that when additional customers call about the issue, it is possible to quickly communicate an understanding of the problem to the customers.
Using tools that can identify when and where a problem originated is very helpful. Since human error is the single largest source of network problems, correlating a configuration change with an outage is often a quick way to identify a potential problem source.
Top performing organizations will create play books of known problems, their symptoms, and the resolutions. These play books help organize troubleshooting efforts and provide a mechanism to communicate historical information to new IT team members. The raw data for these play books is the trouble ticket system. If you're setting up a new play book, start with the most frequent problems and work through to the least frequent problems. Don't skip the challenging problems that took a lot of time and effort to understand, diagnose, and correct. It is important to be able to identify challenging problems quickly -- a simple way of learning from your mistakes.
Examples of typical problems are:
- Calls don't connect
- Calls drop after connecting
- Poor audio or video quality
- Conference calls don't work or have poor quality
- Collaboration tools work in some locations but not others
UC&C systems are comprised of multiple components that use the network in specific ways. The interaction with the network depends on the desired type of communications. It includes real-time voice and video protocols like RTP and WebRTC that run over UDP. There are also TCP-based near-real-time protocols that support collaboration.
UC&C system monitoring tools can provide insight into how well the overall UC&C system is functioning. When the UC&C system isn't working well, there is some underlying problem that must be identified and corrected. Problems with call signaling will affect call setup and termination and transfers. Media traffic will affect call quality, producing poor audio and video. If most calls are acceptable and a few aren't, then look at the paths used by the poor calls -- identify what is different about those calls and look for commonality between the poor calls. It may be defective hardware, bugs in endpoint software, or network problems that affect some locations but not others. Don't forget to look for inconsistent configurations.
Quality of Experience (and the somewhat related Quality of Service) measurements are essential for early identification of problems. One method of quality monitoring is to use simulated calls. This mechanism has the advantage of using the same network data paths that real calls use. Some tools create real calls, which includes interaction with the call controller. Other tools simulate calls by sending and receiving synthetic traffic that tests the data paths for delay, jitter, and packet loss.
Another quality monitoring mechanism is to get call quality reports from the call controller. This is a simple way to monitor real user calls. Of course, there are tools that can be used to tap real calls and determine call quality from the raw packet exchanges.
The UC&C system controller should also be monitored for internal or server hardware problems. Simple things like a full disk are easy to monitor. You should have processes set up so that easy-to-detect problems are never the cause of an outage.
Finally, network infrastructure monitoring should be included. There are a lot of tools in this category. Pick something that you can afford to deploy across the entire infrastructure, including the data center. I've seen organizations that purchased tools that they couldn't afford to deploy everywhere. Sure enough, some of their most pressing problems were in locations that were not monitored because "it became too expensive." Don't forget to monitor key supporting infrastructure services like DHCP and DNS.
Regardless of the tool, provide actionable information. Just providing data isn't useful. The data must support decision-making. For example, reporting interface errors is ok. But it is better if the system can perform some analysis of the data and determine that the real cause is duplex mismatch.Enterprise Connect 2016
A good way to learn about all the different tools is to attend Enterprise Connect 2016 in Orlando, Fla., March 7-10. I will be hosting a session, Trends in Management and Troubleshooting of UC Systems, in which a panel of four vendors will present short, technical descriptions of how their systems help their customers manage UC&C systems. These presentations provide technical information instead of the usual marketing hype. We'll continue with a Q&A portion, after which you can meet the presenters and get more detailed information on their products. Leave with an understanding of which products will help with your UC&C implementation. Of course, you will also have the opportunity to meet with the vendors on the exhibition show floor and see their products in action.
I look forward to seeing you at EC2016!
Learn more about management and security trends and technologies at Enterprise Connect 2016, March 7 to 10, in Orlando, Fla. View the Management and Security track sessions; register now using the code NJPOST to receive $200 off the current conference price.