Terry Slattery
Terry Slattery, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as...
Read Full Bio >>

Terry Slattery | October 06, 2015 |


Requirements for Network Management Best Practices

Requirements for Network Management Best Practices The elements you need for network management, and why it matters.

The elements you need for network management, and why it matters.

Maybe you're wondering why we need network management best practices. To start, all complex systems will have many different ways that they can be configured, each with nearly the same functional characteristics. However, some of those configurations are easier to create, and other configurations will be easier to operate. Finding the optimum trade-off point between creation complexity and operational complexity is often challenging. System experts use their experience to make the right decision in those tradeoffs. It should be obvious that network management is one of those complex systems.

Network management systems (NMS) best practices provide us guidelines, but also help us know when we have finished the system configuration. If we don't have guidelines, then we could spend a lot of effort attempting continual optimization and improvement with insignificant results. I've seen NMS implementations in which the management team is in continuous "tweaking" mode. In some cases, the NMS don't provide the right type of automation to implement functionality across the entire network, so the GUI must be used for every change. This can be very tiring, even on a small network of 30 to 50 devices. The right automation makes the system a pleasure to use because changes to the NMS and changes to the network devices are easily implemented.

With a set of best practices, we have guidelines that drive what we implement and how we use it. Specifically, it tells us what data to collect, how the data is used, and the set of actions that would result from using the data. We can then configure the network management system to collect the data and the subsequent analysis that turns the raw data into actionable information. Our network's operations policies then tell us what to do when actionable information is presented to us.

NMS Policies

An organization should develop network management policies that dictate what data is collected and how it is used. We start with policies as a top-down process. By defining what we want to happen, we can then identify the data and analysis that is required to implement the policy.

For example, we want to monitor interface utilization so that we can predict when traffic levels indicate that we should plan to upgrade the speed of a link. The NMS records interface performance data, and trend analysis can tell us that we will be approaching saturation on a link in two to three months. The operational policy can then dictate that we investigate the link usage and begin planning an upgrade of the link to handle the additional demand.

I like to collect and monitor interface error and discard statistics because it focuses on what isn't working correctly. These statistics allow me to identify interfaces that are experiencing problems or that are becoming congested. The time of day of the congestion and the extent of the congestion are important for determining if further investigation is warranted. This is a case where the operational policy may simply be "investigate further."

The next question about interface performance monitoring is whether network performance data should be collected on all edge interfaces. Or should performance only be monitored on router and switch interconnections and server interfaces? The organizational policy may limit the budget for NMS tools in a way that prevents the installation of a system large enough to monitor all edge interfaces. In this case, it may be necessary to limit performance data collection to infrastructure interfaces. Then decide if server interfaces should also be monitored, or perhaps only interfaces to specific business-critical servers (UC servers, Media Control Units, business application servers, etc).

Another policy example is collecting and archiving device configurations. Operational policies can require that we identify and validate configuration changes each day. In practice, we can then identify configuration changes that happened right before a network failure. We can also use the saved configurations to restore a failed device's configuration after the replacement device is installed.

The list of what to collect, the analysis to be done, and the actions to take can be quite extensive. Start with a short list, and add to it over time. Include data and factors that affect the operation of the business. Don't collect data without a defined policy on how it will be used. This was very clear in Cisco's Performance Management: Best Practices White Paper.

What Do We Need to Monitor and Manage?

Where do we start? It is best to start with simple policies and work up to increasingly complex policies. This way we use what we learn in the initial steps to our advantage in the more complex policies.

Since we can't monitor what we don't know exists, we must start with network discovery. The preferred policy is to use regular network discovery scans to identify all devices on the network. (An alternative policy is to only monitor devices that the networking team tells us about. But this creates an opportunity for a device to be added and not monitored, giving rise to the potential for a failure to affect the organization.) If edge devices are included, you can then use network discovery to locate devices. A recent customer found that several switch interfaces that connected to servers had high utilization. We were able to use the network inventory information to identify the servers and begin planning to increase the interface speeds to those servers.

The next policy is to detect End-of-Life (EoL) equipment and track maintenance agreements. This policy can be a money saver because it reports what is actually installed in the network while most vendors track what was sold to the organization. It also allows the organization to make sure that only supported equipment is used in the critical network infrastructure. The data that must be collected to implement this policy includes installed chassis, network cards, and software/firmware versions. Vendors occasionally have to recall specific network cards, and this policy allows an organization to easily identify them. With the EoL information, planning can take place to upgrade hardware and software before risking the organization's operation on old systems.

Once we know what is on the network, we can implement policies that detect basic hardware failures. Fans and power supplies tend to fail more frequently than other components, so they are an easy choice. CPU, memory, and temperature should be included, with a policy to examine and react to any exceptions over the selected thresholds.

The policy with the most benefit is monitoring configuration changes, also known as Network Change and Configuration Management (NCCM). Configuration changes are the greatest source of network failures (40 to 80%, by most reports). An operational policy that tracks configuration changes and archives all updated configurations can provide a basis for reducing those failure figures. If a network outage occurs, what was the most recent set of configuration changes? Look for changes that are close by in terms of time and in terms of topology to reduce the time to repair.

Creating a policy for interface statistics is pretty straight forward, but I've found that most organizations skip the policy definition phase and focus on interface performance. Performance is easy to understand, but more difficult to create a policy around. Instead, it is better to start with basics like up/down status. An up/down policy should recommend that any router interface or switch trunking interface that is configured in admin-up state should also be operationally up (i.e. up/up). Then it is easy to implement a status check and report on any interfaces in up/down state. An enhancement to this policy is to tag important interfaces (see Device and Interface Tagging) and alert on any important interfaces that are down. This has the advantage of identifying important access interfaces.

The next interface statistics policy would be to determine how to handle interface errors and discards. This policy can state that any errors should be identified and fixed (i.e. the network should run with zero errors). An exception might be half-duplex interfaces where a collision is counted as an error. Interface discards happen naturally at a low level, so set some thresholds for them, perhaps alerting on more than 500 per hour. The policy would require investigating any exceptions to the defined thresholds. A Top-N report helps to sort the list of interfaces to be examined so that the worst offenders are examined and corrected first. With these reports, an interface utilization policy becomes more of a planning tool instead of a network problem-reporting tool. A utilization policy could be defined to examine a Top-N utilization report on a weekly or monthly basis, with the intent to identify interfaces that should be upgraded in the coming months.

Once the above policies are defined, we begin to focus on more complex network analysis. These policies would focus on subsystem operation, looking at things like router redundancy protocols (HSRP/VRRP), Spanning Tree Protocol changes, and broadcast storm detection. Policies around QoS functionality -- making sure that it is applied consistently across the organization and is not dropping packets -- becomes important.

Finally, we begin to look at network virtualization policies. These policies make sure that the virtual topologies are configured and operating correctly. They will depend on what kind of virtual topology implementations are in use (MPLS, GRE tunnels, etc.). Make sure that each policy contains the actions that should be taken if and when the policy is violated.

What's Wrong With This Picture?

Using the above, we see that with a few tweaks, it should be possible for the NMS to automatically configure itself. The main factor is to make the NMS reflect the intent of the policies and to provide the information needed by someone who is implementing the policies. By automating the NMS setup, we can reduce the size of the gigantic NMS puzzle that exists today. But that's a topic for a future post.


March 7, 2018

Video collaboration is experiencing significant change and innovation-how can your enterprise take advantage? In this webinar, leading industry analyst Ira Weinstein will present detailed analysis

February 21, 2018

Business agility has become the strongest driver for enterprises to begin migrating their communications to the cloud-and its a benefit that enterprises are already realizing. To gain this benefit

February 7, 2018

Enterprises are starting to grasp the critical importance of security and compliance in their team collaboration deployments. And once the risks are mitigated, your enterprise can integrate these n

March 12, 2018
An effective E-911 implementation doesn't just happen; it takes a solid strategy. Tune in for tips from IT expert Irwin Lazar, of Nemertes Research.
March 9, 2018
IT consultant Steve Leaden lays out the whys and how-tos of getting the green light for your convergence strategy.
March 7, 2018
In advance of his speech tech tutorial at EC18, communications analyst Jon Arnold explores what voice means in a post-PBX world.
February 28, 2018
Voice engagement isn't about a simple phone call any longer, but rather a conversational experience that crosses from one channel to the next, as Daniel Hong, a VP and research director with Forrester....
February 16, 2018
What trends and technologies should you be up on for your contact center? Sheila McGee-Smith, Contact Center & Customer Experience track chair for Enterprise Connect 2018, gives us the lowdown.
February 9, 2018
Melanie Turek, VP of connected work research at Frost & Sullivan, walks us through key components -- and sticking points -- of customer-oriented digital transformation projects.
February 2, 2018
UC consultant Marty Parker has crunched lots of numbers evaluating UC options; tune in for what he's learned and tips for your own analysis.
January 26, 2018
Don't miss out on the fun! Organizer Alan Quayle shares details of his pre-Enterprise Connect hackathon, TADHack-mini '18, showcasing programmable communications.
December 20, 2017
Kevin Kieller, partner with enableUC, provides advice on how to move forward with your Skype for Business and Teams deployments.
December 20, 2017
Zeus Kerravala, principal analyst with ZK Research, shares his perspective on artificial intelligence and the future of team collaboration.
December 20, 2017
Delanda Coleman, Microsoft senior marketing manager, explains the Teams vision and shares use case examples.
November 30, 2017
With a ruling on the FCC's proposed order to dismantle the Open Internet Order expected this month, communications technology attorney Martha Buyer walks us through what's at stake.
October 23, 2017
Wondering which Office 365 collaboration tool to use when? Get quick pointers from CBT Nuggets instructor Simona Millham.
September 22, 2017
In this podcast, we explore the future of work with Robert Brown, AVP of the Cognizant Center for the Future of Work, who helps us answer the question, "What do we do when machines do everything?"
September 8, 2017
Greg Collins, a technology analyst and strategist with Exact Ventures, delivers a status report on 5G implementation plans and tells enterprises why they shouldn't wait to move ahead on potential use ....
August 25, 2017
Find out what business considerations are driving the SIP trunking market today, and learn a bit about how satisfied enterprises are with their providers. We talk with John Malone, president of The Ea....
August 16, 2017
World Vision U.S. is finding lots of goodness in RingCentral's cloud communications service, but as Randy Boyd, infrastructure architect at the global humanitarian nonprofit, tells us, he and his team....
August 11, 2017
Alicia Gee, director of unified communications at Sutter Physician Services, oversees the technical team supporting a 1,000-agent contact center running on Genesys PureConnect. She catches us up on th....
August 4, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, has lately been working on integrating enterprise communications into Internet of Things ecosystems. He shares examples and off....
July 27, 2017
Industry watcher Elka Popova, a Frost & Sullivan program director, shares her perspective on this acquisition, discussing Mitel's market positioning, why the move makes sense, and more.
July 14, 2017
Lantre Barr, founder and CEO of Blacc Spot Media, urges any enterprise that's been on the fence about integrating real-time communications into business workflows to jump off and get started. Tune and....
June 28, 2017
Communications expert Tsahi Levent-Levi, author of the popular blog, keeps a running tally and comprehensive overview of communications platform-as-a-service offerings in his "Choosing a W....
June 9, 2017
If you think telecom expense management applies to nothing more than business phone lines, think again. Hyoun Park, founder and principal investigator with technology advisory Amalgam Insights, tells ....
June 2, 2017
Enterprises strategizing on mobility today, including for internal collaboration, don't have the luxury of learning as they go. Tony Rizzo, enterprise mobility specialist with Blue Hill Research, expl....
May 24, 2017
Mark Winther, head of IDC's global telecom consulting practice, gives us his take on how CPaaS providers evolve beyond the basic building blocks and address maturing enterprise needs.
May 18, 2017
Diane Myers, senior research director at IHS Markit, walks us through her 2017 UC-as-a-service report... and shares what might be to come in 2018.
April 28, 2017
Change isn't easy, but it is necessary. Tune in for advice and perspective from Zeus Kerravala, co-author of a "Digital Transformation for Dummies" special edition.
April 20, 2017
Robin Gareiss, president of Nemertes Research, shares insight gleaned from the firm's 12th annual UCC Total Cost of Operations study.
March 23, 2017
Tim Banting, of Current Analysis, gives us a peek into what the next three years will bring in advance of his Enterprise Connect session exploring the question: Will there be a new model for enterpris....
March 15, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, discusses the evolving role of the all-important session border controller.
March 9, 2017
Organizer Alan Quayle gives us the lowdown on programmable communications and all you need to know about participating in this pre-Enterprise Connect hackathon.
March 3, 2017
From protecting against new vulnerabilities to keeping security assessments up to date, security consultant Mark Collier shares tips on how best to protect your UC systems.
February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.