SHARE



ABOUT THE AUTHOR


Terry Slattery
Terry Slattery, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as...
Read Full Bio >>
SHARE



Terry Slattery | February 25, 2014 |

 
   

Monitoring a Software Defined Network, Part 4

Monitoring a Software Defined Network, Part 4 SDN offers an opportunity to redesign network monitoring.

SDN offers an opportunity to redesign network monitoring.

Note: My discussion of SDN monitoring covers several topics. Here are the prior posts:

1. Monitoring the SDN data plane
2. What parameters to monitor in an SDN control system
3. Should the SDN monitoring system be integrated with the controller?

Thoughts on SDN device monitoring
Chris Young contacted me this week to talk about this series on monitoring SDNs. Chris works for HP and is one of the few people who is involved in both networking and network management (most people gravitate to one of the two disciplines). We discussed monitoring and managing an SDN, and Chris mentioned some of the things that I had not yet addressed.

First, I want to make a distinction between monitoring and managing. I think of monitoring as a mostly passive function. Collecting, recording, and displaying interface statistics falls into this category. Management is an active function, which results in changes to the network. One might use data collected by a monitoring system to drive corrective actions, i.e., "management."

I've been using the word "monitoring" in this series of blog posts. That's because I've been focused primarily on the data to collect. Of course, collecting data and not using it is fairly pointless. I've always advocated using monitoring data to correct minor problems as well as major failures. But most organizations don't seem to make the time to address the minor problems. They tend to work in fire-fighting mode all the time. Perhaps the work with SDN will allow us to make progress in automating the correction of a lot of minor problems that hurt network performance and stability.

Monitoring Non-Flow Data
One of the topics that Chris and I discussed was monitoring device-centric data. His first reference was to power supplies and fans. I had been ignoring these items, but they are just as important (maybe more so) than flow, interface, and SDN controller data. We can lump this type of data into a "Device Health" category. SDN doesn't (yet) address this type of data because it isn't related to either the control plane or the data plane. (Is it part of the "Management Plane," or something else?)

SDN can make a large collection of switches look like one big switch/router. That's part of the advantage of SDN--it hides implementation details. But when one of the components is experiencing health problems, we need to know about it.

How should an SDN controller report on problems with devices within its infrastructure? Or should there be a separate SDN monitoring system in place that performs this task? These are good questions that I've not seen addressed in anything I've read. If we're not careful, we'll create this new SDN thing and it will be great...until something breaks. Because there is a separate monitoring system that's not well integrated, we'll have organizations that are running their networks blind simply because they are unaccustomed to effectively using a network management system.

The counterpoint to integrating the monitoring functionality is that it adds extra burden to the SDN controller. For better scalability, it would be good to run the data collection and monitoring system in a separate compute module. Perhaps a good architecture is to have the SDN user interface be within the monitoring/management platform and have the SDN controller in a separate platform.

Finally, does the SDN controller southbound interface need to include API calls to retrieve non-flow data from the devices? The current OpenFlow definition doesn't include non-flow data. It seems that we need to think about abstractions that apply to device health as well as abstractions for the data plane.

Do We Use SNMP?
Chris then made a very interesting point: Do we use SNMP (Simple Network Management Protocol) to monitor device health data? That's a very good question. Of course, we could take that approach. It minimizes the amount of work to be done.

However, I see this as an opportunity to begin experimenting with something new and improved. Several vendors have been advocating other data collection mechanisms like XML, mostly around the configuration tasks. But why not apply XML (or JSON, Java Script Object Notation) to the problem? I think it would be nice to drop ASN.1 notation from network management. It is too arcane and doesn't hide any of the details. One might argue that low-level troubleshooting and management relies on the details. There is truth to that, but let's use something that's easier for us humans to handle.

There are some things that SNMP has done well. Being able to walk the MIB (Management information base) of a device is very handy, even without the original MIB definition file. I've sometimes been able to determine useful information about a device by walking its MIB using SNMP. With good names in the XML or JSON representation, it would potentially make it easier to work with a device for which a MIB definition file is not available.

Could we also take this opportunity to improve on some of the error reporting? For example, the TTL Exceeded variable lets us know that a packet was discarded due to its Time To Live expiring. Was this due to a routing loop or due to a traceroute? Which end systems were involved (i.e., what were their IP addresses)?

In addition to the counter, include variables (e.g., TTL Error Address?) that contain the IP address of the source and destination, or perhaps a bigger variable that's an octet string containing the packet header. When an error is detected, make a copy of these variables from the packet header. The network management system can then query the TTL Error Address variable to determine the end systems that were involved. By having this variable available, it would eliminate the need to perform a packet capture. Other error counters that report routing lookup failures or ARP failures could also benefit from this treatment.

Would switching to XML or JSON cause an increase in the size of the OS image of a network device? Possibly. However, I think that the cost is worth it, if it results in better management.

Summary
Vendors working on SDN will need to think about the whole architecture and not just the separation of control and data planes. SDN contains a lot of promise. I certainly hope that we're not building a system in which monitoring and management will be a "bolt-on" afterthought.

Terry Slattery leads a half-day workshop on SDN at Enterprise Connect Orlando 2014.. Check it out!





COMMENTS



August 16, 2017

Contact centers have long been at the leading edge of innovation in communications technology, given their promise of measurable ROI and the continual need to optimize customer interactions and sta

July 12, 2017

Enterprises have been migrating Unified Communications & Collaboration applications to datacenters - private clouds - for the past few years. With this move comes the opportunity to leverage da

May 31, 2017

In the days of old, people in suits used to meet at a boardroom table to update each other on their work. Including a remote colleague meant setting a conference phone on the table for in-person pa

August 16, 2017
World Vision U.S. is finding lots of goodness in RingCentral's cloud communications service, but as Randy Boyd, infrastructure architect at the global humanitarian nonprofit, tells us, he and his team....
August 11, 2017
Alicia Gee, director of unified communications at Sutter Physician Services, oversees the technical team supporting a 1,000-agent contact center running on Genesys PureConnect. She catches us up on th....
August 4, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, has lately been working on integrating enterprise communications into Internet of Things ecosystems. He shares examples and off....
July 27, 2017
Industry watcher Elka Popova, a Frost & Sullivan program director, shares her perspective on this acquisition, discussing Mitel's market positioning, why the move makes sense, and more.
July 14, 2017
Lantre Barr, founder and CEO of Blacc Spot Media, urges any enterprise that's been on the fence about integrating real-time communications into business workflows to jump off and get started. Tune and....
June 28, 2017
Communications expert Tsahi Levent-Levi, author of the popular BlogGeek.me blog, keeps a running tally and comprehensive overview of communications platform-as-a-service offerings in his "Choosing a W....
June 9, 2017
If you think telecom expense management applies to nothing more than business phone lines, think again. Hyoun Park, founder and principal investigator with technology advisory Amalgam Insights, tells ....
June 2, 2017
Enterprises strategizing on mobility today, including for internal collaboration, don't have the luxury of learning as they go. Tony Rizzo, enterprise mobility specialist with Blue Hill Research, expl....
May 24, 2017
Mark Winther, head of IDC's global telecom consulting practice, gives us his take on how CPaaS providers evolve beyond the basic building blocks and address maturing enterprise needs.
May 18, 2017
Diane Myers, senior research director at IHS Markit, walks us through her 2017 UC-as-a-service report... and shares what might be to come in 2018.
April 28, 2017
Change isn't easy, but it is necessary. Tune in for advice and perspective from Zeus Kerravala, co-author of a "Digital Transformation for Dummies" special edition.
April 20, 2017
Robin Gareiss, president of Nemertes Research, shares insight gleaned from the firm's 12th annual UCC Total Cost of Operations study.
March 23, 2017
Tim Banting, of Current Analysis, gives us a peek into what the next three years will bring in advance of his Enterprise Connect session exploring the question: Will there be a new model for enterpris....
March 15, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, discusses the evolving role of the all-important session border controller.
March 9, 2017
Organizer Alan Quayle gives us the lowdown on programmable communications and all you need to know about participating in this pre-Enterprise Connect hackathon.
March 3, 2017
From protecting against new vulnerabilities to keeping security assessments up to date, security consultant Mark Collier shares tips on how best to protect your UC systems.
February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.