Terry Slattery
Terry Slattery, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as...
Read Full Bio >>

Terry Slattery | January 06, 2014 |


Monitoring a Software Defined Network, Part 2

Monitoring a Software Defined Network, Part 2 Monitoring of an SDN controller needs to be useful to both network operators and software developers.

Monitoring of an SDN controller needs to be useful to both network operators and software developers.

Monitoring the SDN Control System
The controller contains the smarts of a software-defined network and is next on our list of things to monitor. (See the prior post about monitoring SDN switches.) The prior post was focused primarily on things that affect the network and are therefore of great interest to network operators. The controller system may be different because of its significant software component. We need to make the monitoring of an SDN useful to both software developers and network operations staff. This will make the specification and development of a monitoring system rather interesting.

Monitoring for Software Developers
Software developers tend to be more interested in failures logged within the software system. A good example is a controller that consumes all of main memory because a table grew too large. Perhaps there was a memory leak in one of the API calls or maybe the developer's code didn't release a data structure after using it. Each call or use would cause a little more memory to be used, and eventually, with enough use, the system would run out of available memory and ultimately crash. Before that happens, though, the system would likely become sluggish as it tries to handle the large table or data structure. Providing tools that allow software developers to clearly see the full sizes of tables and collections of data structures will be important to aid in troubleshooting these types of problems.

There will undoubtedly be some time-critical processes within an SDN controller. An ARP request or a new flow request could require special processing in order to make the network run fast--or at least to keep the network from running too slowly. In this sense, an SDN controller is more like an embedded control system. There will need to be methods for monitoring and reporting on execution times of time-critical processes. It will also be useful if the monitoring of these time-critical processes could generate an asynchronous message (maybe syslog) when the processing time exceeds some boundary or when the process didn't start at its specified time.

Speaking of syslog, I recommend a useful addition to messages. The first part of the text message should contain a unique ID string. Cisco uses this technique to allow the identification of the source of each message.

Message logs will often be about common network problems, such as CDP duplex mismatch detected, but other messages will report significant internal problems. The message ID is divided into three parts that identify the subsystem, the severity, and a descriptive word. The ID is useful for facilitating the sorting and grouping of messages by severity and error type. Some examples appear in a syslog summary script that shows the message IDs, the number of occurrences, and the systems that sent them. Functionality like this is very useful to both developers and network operators.

Software developers who are working on the internals of an SDN controller will have many more diagnostic tools. Those who are developing applications that communicate with an SDN controller via the so-called Northbound interface will also have debugging tools, but will probably not provide visibility into the SDN internals. Both of these development communities will have a lot of good development and diagnostic tools. After all, if they identify a tool that could be useful, they will create it.

Monitoring for Network Operators
Network operators also require different types of tools--those that are focused more on network operations and the ability to diagnose or troubleshoot common problems. Network operators typically don't have the software skills or resources to create their own monitoring tools, so the SDN will need to provide either an API for an external monitoring system or direct access to the necessary data.

SDN doesn't mean that we should forget all the lessons that we have learned in the past about network management. Nor does it mean that good network design practices are somehow incorrect. We will need access to data for physical and logical connectivity. We will need to understand the symptoms indicating that an SDN domain needs to be subdivided. And we will need to know when the SDN domain is having internal and external communication problems. Finally, there will need to be a suite of diagnostic tools that show us flow paths and how those paths were created (i.e., which controller(s) created the entries that resulted in the path).

Communication failures between switches and controllers will be fairly common, so that's at the top of the list to monitor and report. If possible, the system should identify the potential cause of the failure. Switches and controllers both need to report problems. Network topologies can fail in many different ways, some of which may allow one component to communicate with a monitoring system while the other component is isolated. Or latency may increase due to a topology change, causing control system timeouts. Finally, as new revisions of the SDN communication and control protocols are rolled out, we should expect to occasionally find a protocol failure (e.g., an OpenFlow protocol incompatibility), which needs to be reported to the network operations staff.

We should learn something from the history of failures in redundant systems. Remember the stories about the failure of one component of a redundant system? The first failure was frequently not detected and corrected before a second failure in the second path caused a network outage. A communication failure notification from one or more switches should alert the network operations staff to the communication problem so that they can take remedial action before a second failure causes part of the network to become segmented, creating an outage.

Not all network problems create hard failures. The monitoring system will need to report packet loss statistics and retransmissions. These statistics will provide an indication of the reliability of network connections between elements.

I don't expect SDN to eliminate the major source of network outages: configuration errors. So there will need to be mechanisms in place to troubleshoot and monitor all the configuration parameters necessary in order for the SDN to function. For example, encryption keys will need to match, and there must be mechanisms that allow easy migration from one set of keys to another without requiring a flag day conversion.

We will also need to track the configuration that creates the switch-controller relationships. And don't forget that each SDN domain will need to communicate with the rest of the network or with external networks. There will certainly be configurations for routing protocols so that the SDN domain can forward packets to the correct next hop for external destinations.

Network operators will also need access to data from protocols that run within SDN switches, such as LLDP (Link-Layer Discovery Protocol) and BFD (Bi-directional Forwarding Detection). Note: Some of these protocols will need to run within the switches instead of in the controller, simply because that's a much more logical place for them to be located. The data that these protocols collect will be sent to the controller for use in determining forwarding tables and for diagnostic purposes.

Network diagrams that provide visibility into the SDN control system topology (both logical and physical) will be extremely useful, especially in the early days when we're all learning how these new networks operate (when they are functioning correctly) and fail (when they are not).

The combination of SDN switch monitoring and SDN control system monitoring can provide us better visibility into the operation of the SDN system. I just hope that the necessary hooks are created that will allow the visibility that is needed.


Enterprise Connect Orlando 2018
March 12-15 | Orlando, FL

Connect with the Entire Enterprise Communications & Collaboration Ecosystem

Stay Up-to-Date: Hear industry visionaries in Keynotes and General Sessions delivering the latest insight on UC, mobility, collaboration and cloud

Grow Your Network: Connect with the largest gathering of enterprise IT and business leaders and influencers

Learn From Industry Leaders: Attend a full range of Conference Sessions, Free Programs and Special Events

Evaluate All Your Options: Engage with 190+ of the leading equipment, software and service providers

Have Fun! Mingle with sponsors, exhibitors, attendees, guest speakers and industry players during evening receptions

Register now with code NOJITTEREB to save $200 Off Advance Rates or get a FREE Expo Pass!

March 7, 2018

Video collaboration is experiencing significant change and innovation-how can your enterprise take advantage? In this webinar, leading industry analyst Ira Weinstein will present detailed analysis

February 21, 2018

Business agility has become the strongest driver for enterprises to begin migrating their communications to the cloud-and its a benefit that enterprises are already realizing. To gain this benefit

February 7, 2018

Enterprises are starting to grasp the critical importance of security and compliance in their team collaboration deployments. And once the risks are mitigated, your enterprise can integrate these n

March 12, 2018
An effective E-911 implementation doesn't just happen; it takes a solid strategy. Tune in for tips from IT expert Irwin Lazar, of Nemertes Research.
March 9, 2018
IT consultant Steve Leaden lays out the whys and how-tos of getting the green light for your convergence strategy.
March 7, 2018
In advance of his speech tech tutorial at EC18, communications analyst Jon Arnold explores what voice means in a post-PBX world.
February 28, 2018
Voice engagement isn't about a simple phone call any longer, but rather a conversational experience that crosses from one channel to the next, as Daniel Hong, a VP and research director with Forrester....
February 16, 2018
What trends and technologies should you be up on for your contact center? Sheila McGee-Smith, Contact Center & Customer Experience track chair for Enterprise Connect 2018, gives us the lowdown.
February 9, 2018
Melanie Turek, VP of connected work research at Frost & Sullivan, walks us through key components -- and sticking points -- of customer-oriented digital transformation projects.
February 2, 2018
UC consultant Marty Parker has crunched lots of numbers evaluating UC options; tune in for what he's learned and tips for your own analysis.
January 26, 2018
Don't miss out on the fun! Organizer Alan Quayle shares details of his pre-Enterprise Connect hackathon, TADHack-mini '18, showcasing programmable communications.
December 20, 2017
Kevin Kieller, partner with enableUC, provides advice on how to move forward with your Skype for Business and Teams deployments.
December 20, 2017
Zeus Kerravala, principal analyst with ZK Research, shares his perspective on artificial intelligence and the future of team collaboration.
December 20, 2017
Delanda Coleman, Microsoft senior marketing manager, explains the Teams vision and shares use case examples.
November 30, 2017
With a ruling on the FCC's proposed order to dismantle the Open Internet Order expected this month, communications technology attorney Martha Buyer walks us through what's at stake.
October 23, 2017
Wondering which Office 365 collaboration tool to use when? Get quick pointers from CBT Nuggets instructor Simona Millham.
September 22, 2017
In this podcast, we explore the future of work with Robert Brown, AVP of the Cognizant Center for the Future of Work, who helps us answer the question, "What do we do when machines do everything?"
September 8, 2017
Greg Collins, a technology analyst and strategist with Exact Ventures, delivers a status report on 5G implementation plans and tells enterprises why they shouldn't wait to move ahead on potential use ....
August 25, 2017
Find out what business considerations are driving the SIP trunking market today, and learn a bit about how satisfied enterprises are with their providers. We talk with John Malone, president of The Ea....
August 16, 2017
World Vision U.S. is finding lots of goodness in RingCentral's cloud communications service, but as Randy Boyd, infrastructure architect at the global humanitarian nonprofit, tells us, he and his team....
August 11, 2017
Alicia Gee, director of unified communications at Sutter Physician Services, oversees the technical team supporting a 1,000-agent contact center running on Genesys PureConnect. She catches us up on th....
August 4, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, has lately been working on integrating enterprise communications into Internet of Things ecosystems. He shares examples and off....
July 27, 2017
Industry watcher Elka Popova, a Frost & Sullivan program director, shares her perspective on this acquisition, discussing Mitel's market positioning, why the move makes sense, and more.
July 14, 2017
Lantre Barr, founder and CEO of Blacc Spot Media, urges any enterprise that's been on the fence about integrating real-time communications into business workflows to jump off and get started. Tune and....
June 28, 2017
Communications expert Tsahi Levent-Levi, author of the popular blog, keeps a running tally and comprehensive overview of communications platform-as-a-service offerings in his "Choosing a W....
June 9, 2017
If you think telecom expense management applies to nothing more than business phone lines, think again. Hyoun Park, founder and principal investigator with technology advisory Amalgam Insights, tells ....
June 2, 2017
Enterprises strategizing on mobility today, including for internal collaboration, don't have the luxury of learning as they go. Tony Rizzo, enterprise mobility specialist with Blue Hill Research, expl....
May 24, 2017
Mark Winther, head of IDC's global telecom consulting practice, gives us his take on how CPaaS providers evolve beyond the basic building blocks and address maturing enterprise needs.
May 18, 2017
Diane Myers, senior research director at IHS Markit, walks us through her 2017 UC-as-a-service report... and shares what might be to come in 2018.
April 28, 2017
Change isn't easy, but it is necessary. Tune in for advice and perspective from Zeus Kerravala, co-author of a "Digital Transformation for Dummies" special edition.
April 20, 2017
Robin Gareiss, president of Nemertes Research, shares insight gleaned from the firm's 12th annual UCC Total Cost of Operations study.
March 23, 2017
Tim Banting, of Current Analysis, gives us a peek into what the next three years will bring in advance of his Enterprise Connect session exploring the question: Will there be a new model for enterpris....
March 15, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, discusses the evolving role of the all-important session border controller.
March 9, 2017
Organizer Alan Quayle gives us the lowdown on programmable communications and all you need to know about participating in this pre-Enterprise Connect hackathon.
March 3, 2017
From protecting against new vulnerabilities to keeping security assessments up to date, security consultant Mark Collier shares tips on how best to protect your UC systems.
February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.