SHARE



ABOUT THE AUTHOR


Terry Slattery
Terry Slattery, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as...
Read Full Bio >>
SHARE



Terry Slattery | September 20, 2012 |

 
   

Want a Five-Nines Network? (Part 3)

Want a Five-Nines Network? (Part 3) Network monitoring is essential for detecting failures in a resilient network as well as providing insight into how well the network is running and where it could be improved.

Network monitoring is essential for detecting failures in a resilient network as well as providing insight into how well the network is running and where it could be improved.

This is the third post in a series about steps that you can take to have a five-nines network--that is, a network with 99.999% availability. Five-nines is generally considered to be the goal of converged networks. It is the metric that was common for the historical voice network.

This blog post describes how to use network and configuration management to increase network availability. Network management is one of my specialties and I've created a Network Management Architecture, which is described at http://www.netcraftsmen.net/resources/blogs/nms-architecture-fcaps-or-itil.html.

Manage IT!
How do you know when one component fails in a resilient network? A resilient network will continue to run, perhaps in degraded mode. Network management systems must be used to monitor all parts of a resilient network and must let you know that some part of it has failed so that you can fix it before another component fails, causing an outage.

Having spent time working in financial networks, which have similar requirements, I've seen quite a number of failures occur where the analysis showed that both parts of a redundant configuration failed, often weeks or months apart. The first failure went unnoticed because there was no outage. It is only when the second failure occurred that both the first and second failures were found.

How do you prevent such failures? Network Management! You have to monitor the network to identify failures. The system should generate alerts when a key device or interface fails. You can also set thresholds to create alerts when the utilization of an interface changes substantially, either to near zero or to very high levels. Big changes may mean that the routing or spanning tree protocols changed paths due to a change in the network. If you are aware of the change, then the alert is validation that the network management system is working correctly. If you're not aware of a change that would create the alert, then there is something to investigate.

Another way to monitor the network is to perform active monitoring by using synthetic tests. I sometimes call these tests "application level pings," because they run at the application layer. For example, if sending an email takes longer than usual or fails to complete, then there's either a network problem or an email server problem. Web page retrieval tests perform the same type of monitoring.

In converged networks, there are two important monitoring steps to take. The first is to monitor the endpoints for connection quality. What are the typical stats for delay, jitter, and loss? Are calls terminated abnormally? The stats from real calls are a great way to keep an eye on how the network is performing and to highlight trouble areas. Increasing loss and jitter are early indications of congestion somewhere in the path. The path may have changed due to a failure in the original path and the result is oversubscription on the secondary path. Or perhaps the primary path is now oversubscribed and it is now congested.

The second step for monitoring converged networks is to generate synthetic voice and video traffic. I refer to this as active testing. It is similar to the "application level pings" that I described above. There are at least two methods for generating voice/video synthetic traffic. One is to add probes to key points in the network, such as at each major site, and run tests between the probes. Another is to create synthetic calls to the endpoints, but this requires that the voice and video endpoints support test calls without someone manually initiating them.

When a problem is identified it should be entered into a trouble ticket system to aid in tracking the failures. You can then perform analysis on the most frequent types of failures, allowing you to determine which failures are most common.

Finally, spend time to identify and fix common well-known problems. Duplex mismatch comes to mind as a great example. A lot of people think that duplex mismatch isn't a big problem and that they can let it go. As long as the link is very lightly loaded, they are correct. But high-volume links will have very poor throughput. Other examples are flapping interfaces, unstable routing and spanning tree protocol instances, high-utilization links (more than 50% average utilization or 70% 95th percentile utilization), and interfaces reporting errors or discards.

Taking care of all the small problems makes the network more stable and efficient. You can then focus on bigger problems and you know that two small problems aren't interacting to produce a larger symptom.

Next page: Configuration management





COMMENTS



Enterprise Connect Orlando 2018
March 12-15 | Orlando, FL

Connect with the Entire Enterprise Communications & Collaboration Ecosystem


Stay Up-to-Date: Hear industry visionaries in Keynotes and General Sessions delivering the latest insight on UC, mobility, collaboration and cloud

Grow Your Network: Connect with the largest gathering of enterprise IT and business leaders and influencers

Learn From Industry Leaders: Attend a full range of Conference Sessions, Free Programs and Special Events

Evaluate All Your Options: Engage with 190+ of the leading equipment, software and service providers

Have Fun! Mingle with sponsors, exhibitors, attendees, guest speakers and industry players during evening receptions

Register now with code NOJITTEREB to save $200 Off Advance Rates or get a FREE Expo Pass!

October 18, 2017

Microsofts recent Ignite event had some critically important announcements for enterprise communications. Namely, Microsofts new Team Collaboration offering, Teams, will be its primary communicatio

September 20, 2017

Customer experience can make or break your business. But how do you achieve outstanding customer service when you're dealing with outdated organizational structure, lagging technology, dated proces

August 16, 2017

Contact centers have long been at the leading edge of innovation in communications technology, given their promise of measurable ROI and the continual need to optimize customer interactions and sta

September 22, 2017
In this podcast, we explore the future of work with Robert Brown, AVP of the Cognizant Center for the Future of Work, who helps us answer the question, "What do we do when machines do everything?"
September 8, 2017
Greg Collins, a technology analyst and strategist with Exact Ventures, delivers a status report on 5G implementation plans and tells enterprises why they shouldn't wait to move ahead on potential use ....
August 25, 2017
Find out what business considerations are driving the SIP trunking market today, and learn a bit about how satisfied enterprises are with their providers. We talk with John Malone, president of The Ea....
August 16, 2017
World Vision U.S. is finding lots of goodness in RingCentral's cloud communications service, but as Randy Boyd, infrastructure architect at the global humanitarian nonprofit, tells us, he and his team....
August 11, 2017
Alicia Gee, director of unified communications at Sutter Physician Services, oversees the technical team supporting a 1,000-agent contact center running on Genesys PureConnect. She catches us up on th....
August 4, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, has lately been working on integrating enterprise communications into Internet of Things ecosystems. He shares examples and off....
July 27, 2017
Industry watcher Elka Popova, a Frost & Sullivan program director, shares her perspective on this acquisition, discussing Mitel's market positioning, why the move makes sense, and more.
July 14, 2017
Lantre Barr, founder and CEO of Blacc Spot Media, urges any enterprise that's been on the fence about integrating real-time communications into business workflows to jump off and get started. Tune and....
June 28, 2017
Communications expert Tsahi Levent-Levi, author of the popular BlogGeek.me blog, keeps a running tally and comprehensive overview of communications platform-as-a-service offerings in his "Choosing a W....
June 9, 2017
If you think telecom expense management applies to nothing more than business phone lines, think again. Hyoun Park, founder and principal investigator with technology advisory Amalgam Insights, tells ....
June 2, 2017
Enterprises strategizing on mobility today, including for internal collaboration, don't have the luxury of learning as they go. Tony Rizzo, enterprise mobility specialist with Blue Hill Research, expl....
May 24, 2017
Mark Winther, head of IDC's global telecom consulting practice, gives us his take on how CPaaS providers evolve beyond the basic building blocks and address maturing enterprise needs.
May 18, 2017
Diane Myers, senior research director at IHS Markit, walks us through her 2017 UC-as-a-service report... and shares what might be to come in 2018.
April 28, 2017
Change isn't easy, but it is necessary. Tune in for advice and perspective from Zeus Kerravala, co-author of a "Digital Transformation for Dummies" special edition.
April 20, 2017
Robin Gareiss, president of Nemertes Research, shares insight gleaned from the firm's 12th annual UCC Total Cost of Operations study.
March 23, 2017
Tim Banting, of Current Analysis, gives us a peek into what the next three years will bring in advance of his Enterprise Connect session exploring the question: Will there be a new model for enterpris....
March 15, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, discusses the evolving role of the all-important session border controller.
March 9, 2017
Organizer Alan Quayle gives us the lowdown on programmable communications and all you need to know about participating in this pre-Enterprise Connect hackathon.
March 3, 2017
From protecting against new vulnerabilities to keeping security assessments up to date, security consultant Mark Collier shares tips on how best to protect your UC systems.
February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.