4 Tips to Reduce Network MTTR

audy_indy__AdobeStock_127209796.jpeg

Image: audy_indy - stock.adobe.com

We all wish we could reduce the mean time to repair (MTTR) of network outages. Here are several tips for doing so; each optimizes part of the diagnostic process and when combined result in big benefits.

The People, Process, and Technology Framework

IT operations revolves around a triad of people, process, and technology. Sharp people, working with known and well-rehearsed processes and using current technology, can produce remarkable results. We’ll see how the tips relate to the triad.

Image: Author

Tip 1: Employ Trusted Network Experts

Your network experts, whether they are employees or contractors, are the most important element. If you use contractors, it is best if they are working on your network on a regular basis so that they learn and understand your business and how the network supports the business.

In addition to learning how the network functions, they will learn its idiosyncrasies, which is where failures and slowdowns are most likely. It is this knowledge that enables your sharp people to make intuitive leaps regarding potential causes of problems.

Tip 2: Create Good Network Documentation

You’ll need good system documentation and network baseline data to validate what the network should look like and how it should function. If the necessary documentation doesn’t exist, take the time to create it. This is a critical process. Good documentation comprises:

Network diagrams that show both physical and logical connectivity that’s so important to the troubleshooting process — Creating a single diagram that shows both can be challenging, so you may need multiple diagrams. You should be able to follow a network path between any two points and identify places to gather data or test hypotheses.
Written policies that describe the network’s design, operation, and future growth — Policies should describe things like the network segmentation paradigms, addressing plan, site interconnectivity mechanisms, network management goals, and routing/switching policies.
Documentation for network equipment refresh planning, upgrades to new technologies, and growth plans — Make sure to include diagnostic tools that are specific to any new technologies.
Run-books that describe typical problems and the mechanisms that worked in the past for diagnosing them — A well-written run-book for a single scenario should allow a more junior network engineer to diagnose and remediate common problems successfully.

Tip 3: Develop Consistent Network Building Block Designs

Another process element is the use of consistent network building block designs to yield significant gains in simplification, documentation, monitoring, and troubleshooting. You should tie the building block designs to equipment refresh cycles. Each cycle may (it doesn’t have to) result in a slightly different design and new equipment with new configurations. Occasionally, you’ll have a significant change that drives an entirely new design paradigm, such as the switch from MPLS to SD-WAN (and the just-starting change to secure access service edge (SASE). This may be the opportunity to implement a more widespread change if the savings offsets any residual value or cost of the old implementations. Note that you’ll need new design and troubleshooting documentation to go with changes in the building block designs you adopt.

Don’t fall for enticements to use shiny new products and features or to switch vendors. Rather, only implement changes with sound reasoning. Standardization means that you sometimes give up on some of these things to make the network easier to monitor, manage, and troubleshoot. The place for new technology is in the lab, during the process of creating new building block designs.

Tip 4: Accelerate Diagnosis with Automation

Gone are the days of manually logging into network equipment and collecting troubleshooting information from the command line interface. Network automation (the technology component in the diagram above) is not just for deploying new configurations. In fact, using automation for the rapid collection and correlation of the same data as the manual process simply accelerates the diagnostic process. Because collecting diagnostic data is a read-only operation, automating this process causes no risk to the network — an objection that some people have regarding automation.

Coupling automation with trouble-ticketing systems and UC collaboration tools yields a powerful system that’s able to perform diagnostic data collection quickly and push the results into a chat space where the network team, regardless of their location, can view it and collaborate on troubleshooting. This method of operation has a name: ChatOps. This is a true paradigm shift in network troubleshooting that promises to reduce the MTTR.

Summary

Excellence in operations begins with the basic framework: people, process, and technology. The integration of these elements and the depth of their use is what results in gains in the troubleshooting realm. The first three tips have been around for as long as we’ve been doing networking and should be part of basic network operations. The last tip, regarding the use of automation, has seen sporadic use until recently, when the scale of networks mandated the switch to automation. The automation of troubleshooting data collection with network team collaboration tools has opened a whole new world for reducing the time to diagnose network problems.

Tags:

network automation

network troubleshooting

MTTR

framework

News & Views

Systems Management & Network Design

Articles You Might Like

Why Telcos Need to Embrace Optical Networks

Tom Nolle

October 10, 2023

It’s not just a good technical solution. It’s also a great financial play for telco operators.

Telephony and Technology Documentation is Critical to Enterprise Communications

Barbara A Grothe

August 07, 2023

Knowing what's in your phone closet, knowing who's using which services, and knowing what you're paying for – all of that becomes easier when every element in the communications set-up is documented.

The Real Optical Network Revolution Still Hasn't Happened Yet

Tom Nolle

July 25, 2023

There's still a lot of room for improvements in cost reduction and network traffic management. Optical routing could be one way to address those shortfalls.

A Network Evolution Starts With an Information Revolution

Tom Nolle

February 23, 2023

We need to rethink how to present information as needed rather than make the worker run and get it. That will kick-start the next phase in network evolution.

Search form

4 Tips to Reduce Network MTTR

audy_indy__AdobeStock_127209796.jpeg

Tags:

Articles You Might Like

Why Telcos Need to Embrace Optical Networks

Telephony and Technology Documentation is Critical to Enterprise Communications

The Real Optical Network Revolution Still Hasn't Happened Yet

A Network Evolution Starts With an Information Revolution