Terry Slattery
Terry Slattery, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as...
Read Full Bio >>

Terry Slattery | June 23, 2013 |


Listen! The Network Is Talking To You

Listen! The Network Is Talking To You Your network can tell you a lot about what it is experiencing.

Your network can tell you a lot about what it is experiencing.

How do you know that the network is having problems? Do you rely on your customers to report problems? There's a better way: have the network tell you when it detects problems.

This is a reactive approach to keeping an eye on the network; when you receive an alert, the network has already detected a problem. However, using alerts is a great way to get started with monitoring a network. It is relatively easy to implement and it can alert you to problems that may go unnoticed. In a converged network that is providing transport to UC application flows, knowing that the network is having a problem is a big win.

There are several ways in which the network can provide alerts. The ideal is that components within the network report problems and all you need to do is to look at what the network is saying and correct the problems. It sounds easy, and while it is straightforward, the sheer volume of alerts can make it a daunting task.

What are Network Alerts?
Networks can provide alerts in more than one way. I always like to start with basic sysylog alerting.

Many devices, both network and non-network, can generate syslog messages when the device detects a problem. The messages are in text format and can be parsed by machine or read by a person. The text format can cause machine parsing problems, but with the newer syslog servers, the job of handling messages is greatly reduced. I have had success with the open source version of syslog-ng, and a supported version is also available for organizations that need a supported product. It is able to perform local processing as well as forwarding messages to other network management systems.

Cisco has made the machine parsing job easier by uniquely identifying each syslog message. Below are two sample syslog messages that demonstrate the formatting. The error identifiers are "%CDP-4-DUPLEX_MISMATCH:" and "%PM-SP-4-ERR_DISABLE: ", respectively. All identifiers start with '%' and end with ':', making them easy to read as well as easy to parse by machine.

Note that the first error is a duplex mismatch with a Cisco phone, which creates more packet loss as the data rate on the phone increases. A computer attached to the local data port on the phone can create enough traffic to cause voice calls to have problems, in addition to impacting application performance on the computer.

05-01-2013 00:00:05 Local7.Warning 1410780: May 1 00:00:04.384 edt: %CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on GigabitEthernet3/2 (not half duplex), with SEP00070F67A24F Port 1 (half duplex).

05-03-2013 07:15:38 Local7.Warning 22119: May 3 07:15:36.805 edt: %PM-SP-4-ERR_DISABLE: diagnostics error detected on Gi3/39, putting Gi3/39 in err-disable state

SNMP Traps are another source of network alerts. These alerts are in binary format, encoded by SNMP, and require a network management application that is loaded with the corresponding MIBs (Management Information Bases) in order to decode the traps. The disadvantage is that because they are binary, they can't be examined without the help of the MIB and a network management system to decode them.

Check your network management system for SNMP Trap processing functionality. Hopefully it will have a way to summarize the events so that you are working with groups of problems instead of individual problems (see the section below about handling large event volume).

A third source of network management alerts is from a network management system. It can generate alerts when interfaces are experiencing too many errors or when interface utilization is higher than some threshold. These types of alerts are provided by most network management systems--everyone seems to want to start monitoring performance data. In practice, monitoring errors and drops is more beneficial than monitoring link utilization.

Finally, the UC system servers are a good source of alerts. Many of them can generate syslog and/or SNMP Traps. There's nothing better than the UC system to tell you that a particular end station is having problems.

Because these alerts are typically via syslog or SNMP Trap, they fall into the processing methods used for either of these alerting mechanisms. Some UC systems may require that you login and examine an error report page. This can make it less timely because it requires that you login to the UC system to learn of a problem.

Handling Large Event Volume
A large network, or a network that needs cleanup, can generate a large number of events per day. If you look at the total number of events, you can easily get discouraged. How can you fix one problem at a time to reduce a list of 100,000 or more events?

Fortunately, some events are easily classified into groups, which greatly simplifies their handling. For example, the CDP Duplex mismatch that is shown in the example above may be generated every ten minutes from a single device, creating 1,440 syslog entries per day. If multiple ports on a switch have this problem, the syslog messages from the one switch may be in tens or hundreds of thousands.

I prefer to use a syslog summary script to summarize hundreds of thousands of syslog messages into a few pages, sorted by frequency, so I know what to tackle first. It is easy to look through the list of message types to find those that are the most important, then identify the device that is experiencing the problem.

The CDP Duplex Mismatch problem identified above was due to several mis-configured ports on one switch. The summary showed that it reported a total of 129,130 Duplex Mismatch events in a single day (see below). Correcting the configuration across all ports on this switch corrected quite a number of problems that were impacting end station connections into the network.


The summary scripts are available for free at:

Since the summary output is only a few pages, even on a big, busy network, it is easily emailed to the network team or posted on a web page. The operations staff can then look for critical errors and for high volume errors. Over a few weeks, it is possible to make a significant improvement in the network.

Network Events and SDN
Since I've been writing about SDN recently, I thought it might be worthwhile to discuss how events might be used in an SDN environment. There is basically no change. The events will typically be from the underlay network--the physical infrastructure. There will also be a set of new alerts that are specific to SDN, but those are likely to be a subset to the volume generated by the physical network infrastructure.

Your network can tell you a lot about what it is experiencing. All that's required is to collect what it has to say and summarize it so that the volume does not overwhelm you. Finding and fixing the problems that it reports will result in a much more smoothly operating network.


April 19, 2017

Now more than ever, enterprise contact centers have a unique opportunity to lead the way towards complete, digital transformation. Moving your contact center to the cloud is a starting point, quick

April 5, 2017

Its no secret that the cloud offers significant benefits to enterprises - including cost reduction, scalability, higher efficiency, and more flexibility. If your phone system and contact center are

March 22, 2017

As today's competitive business environments push workforces into overdrive, many enterprises are seeking ways of streamlining workflows while optimizing productivity, business agility, and speed.

April 20, 2017
Robin Gareiss, president of Nemertes Research, shares insight gleaned from the firm's 12th annual UCC Total Cost of Operations study.
March 23, 2017
Tim Banting, of Current Analysis, gives us a peek into what the next three years will bring in advance of his Enterprise Connect session exploring the question: Will there be a new model for enterpris....
March 15, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, discusses the evolving role of the all-important session border controller.
March 9, 2017
Organizer Alan Quayle gives us the lowdown on programmable communications and all you need to know about participating in this pre-Enterprise Connect hackathon.
March 3, 2017
From protecting against new vulnerabilities to keeping security assessments up to date, security consultant Mark Collier shares tips on how best to protect your UC systems.
February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.