How to Handle Network Performance Data

2E99NF2.jpg

Image: Andrey Suslov - Alamy Stock Vector

Monitoring the right things in your network and making them visible in dashboards can make the difference between proactive and reactive responses to network problems. I recently wrote about using log data in network management, and now it’s time to talk about how to handle network performance data.

What to Monitor

Network monitoring systems must collect the data necessary to detect problems and diagnose failures. Pay attention to the following data:

Interface state changes (up/down),
Errors
Dropped packets
Capacity utilization
Device memory
CPU utilization

In addition to the above, collect data about routing and switching protocols and neighbors. This is beneficial for diagnosing packet forwarding problems. It is best to collect this data on a one-minute or five-minute basis. This helps catch problems before they scale up. Less frequent collection can hide significant events. As I wrote last year, “We frequently have difficulty conceptualizing how problems scale up as our IT systems grow. Events that are supposed to be extremely rare are occurring more frequently than our minds would otherwise indicate.”

A digital experience monitoring platform’s automated test results is another valuable data source. Correlating and reporting of this data drives proactive detection of network problems.

Collecting the Data

There has been a push recently to transition from SNMP’s “pull” model of data collection to telemetry’s “push” model. But they aren’t the only sources of data. There are also the command line interface, Windows management interface, and APIs, to name a few. The reality is that the data collection method doesn’t make a significant difference. The objective is to get the data and store it in a format that makes it useful for analysis.

Data Storage

Network monitoring relies on multiple types of data. Some data is highly relational and is best stored in a relational database. Examples of this relational data include: device types, operating system version, hardware inventory, location, and technical contact. However, performance data is tied to the time it was collected and is best stored in a time-series database.

Matching the data type with its corresponding database type provides the best performing network monitoring system (NMS) platform. Systems that attempt to store performance data in a relational database typically need a large platform architecture with multiple collectors and these systems have a correspondingly high cost of implementation and management. One of the reasons that relational databases don’t work well for performance data is that the database is optimized for reading by maintaining indexes. Updating the table indexes when writing the performance data into the relational database makes it significantly slower than writing the same data to a time-series database.

Performance Monitoring Dashboards

The network management system needs to turn the collected data into actionable information. This is where many systems are deficient. There might be flashy graphics with pie charts and bar charts that show how well parts of the network are performing. But what’s needed are displays of the network elements that need attention, like interfaces with high errors/drops, high utilization, or nearly full stateful firewall tables.

How do you identify the most problematic statistics? One of the most useful methods is through a “Top-10” report that shows the top ten elements, sorted by either percentage or count. Note that packet loss percentage thresholds should be on the order of 0.0001% of total packets to identify interfaces that are impacting TCP performance. Utilization figures should be based on the 95th percentile calculation as noted in the article on the hazards of average network statistics.

Displaying all those statistics in a concise dashboard is quite a challenge and that’s where most network management platforms fail. Some displays need both the current value and a historical graph so that you can determine the trend. Here’s a concise mock-up of an information-packed dashboard. It takes advantage of Top-5 lists and shows separating network and security events into different categories. A more complete dashboard would include additional statistics.

Additional network management data sources, such as alerts from digital experience monitoring can be incorporated by using APIs to collect the data.

Scaling Issues

Networks with more than a few thousand interfaces will start to encounter scaling issues. The main factor is the number of interfaces that must be monitored. It is typical for most mid-scale network equipment to average about fifty network interfaces. Networks that use a significant number of virtual interfaces may have a higher average. Multiply the average interface figure by the number of devices to arrive at an approximate number of managed elements. That’s the figure that most vendors will need to size a network management system (and its price).

Collecting the data, storing, analyzing, and displaying the results all have an impact on the size of the management system. Many network management systems address the scale by rolling up the data on a periodic basis. The typical process averages the raw daily performance data into hourly data every day. The resulting loss of high-fidelity data makes it impossible to perform detailed historic analysis.

The user interface presents another scaling issue. How does the system display the data collected from hundreds or thousands of devices and interfaces? That’s where the user interface must provide easy-to-use display filtering and concise reports that highlight actionable information.

Summary

Network monitoring is a typical big data business application. Large volumes of data contain clues to network performance problems. Applying the right techniques can help you identify the problems so that you can improve the network’s operation.

Tags:

network management

network administration

News & Views

Monitoring, Management and Security

Enterprise Networking

Monitoring & Management

Articles You Might Like

How Verizon Enables Scalable And Seamless Multi-Vendor SASE

Zeus Kerravala

May 10, 2023

Secure access service edge (SASE) deployments have seen strong momentum thanks to increased complexity in managing networks and dealing with security threats.

Protecting Your Business – When the Rubber Hits the Cloud

Scott Murphy

January 23, 2023

Securing technology assets needs to be a priority for every business -- the challenge is determining how. Here are the three primary classes of solutions you should look at.

Improving Network Security Through Segmentation

Terry Slattery

August 30, 2022

While segmentation may seem to increase network complexity due to the additional filtering points, a good implementation will improve and simplify security.

Cisco Combats Network Problems Impacting Hybrid Workers

Dana Casielles

June 08, 2022

Automated Session Testing pinpoints IT issues, while Agent View provides a side-by-side view of all metrics for an individual user. Learn how the technology works in this Q&A.

Search form