Terry Slattery
Terry Slattery, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as...
Read Full Bio >>

Terry Slattery | March 25, 2013 |


Network Redundancy or Resilience?

Network Redundancy or Resilience? Designing a resilient network is more than just adding redundancy. You must understand business needs, then incorporate the level of redundancy required to create a resilient network.

Designing a resilient network is more than just adding redundancy. You must understand business needs, then incorporate the level of redundancy required to create a resilient network.

I've been doing a session about network resilience in John Bartlett's QoS workshop at Enterprise Connect for the past several years. The network is the transport for voice and video communications and has become critical infrastructure for business operations. John's session discusses QoS, where and when it is needed (everywhere and all the time), and the factors that come into play when considering the deployment of QoS in the network. One of the factors is network resilience and its interaction with network redundancy.

It is useful to review the definitions of the terms 'resilience' and 'redundancy', so I did some research on their definitions.

* re-dun-dan-ce (r-dndn-s) n.
6. Electronics Duplication or repetition of elements in electronic equipment to provide alternative functional channels in case of failure.

* re-sil-ience (r-zlyns) n.
1. The ability to recover quickly from illness, change, or misfortune; buoyancy.

A redundant system includes multiple channels to provide alternate paths for communications in case of individual failures. Providing one redundant channel can allow the network to continue to function when a single failure occurs, provided that there are no devices or links in common that form a single point of failure.

Resilience, on the other hand, refers to a system's ability to adapt to failures and to resume normal operations when the failure has been resolved.

A network will typically incorporate redundant components and links in order to construct a resilient network. But just because a network contains redundant components, that doesn't mean that it is resilient, from the standpoint of business operations.

Maximum Use of Parallel Paths
I've occasionally encountered business managers who want to optimize the Return On Investment (ROI) of all parts of the network. When the network is designed with multiple paths, these managers want to make maximum use of all paths. They tend to not think about the case of operation when a failure occurs, which puts the business at risk.

Let's say that we have two sites that are connected by two parallel paths of equal bandwidth and latency (i.e. equal cost), as shown in Figure 1 below. For purposes of discussion, let's say that the two links are each running at 100Mbps.

Figure 1. Two sites, parallel paths, equal cost

The business manager would like for both paths to carry significant traffic, so that the business earns more money from the use of both links. Let's say that each path is running at 60Mbps on average, with peaks to 95Mbps (a practical maximum). When a failure occurs, the one remaining path would need to carry an average of 120Mbps with peaks of 190Mbps in order to operate without impacting the business applications. But the remaining link is limited to about 95Mbps peak throughput. In the failure scenario, an average of about 25Mbps of traffic can't be handled and certainly none of the peaks are handled.

The single link will be significantly oversubscribed, causing TCP-based applications to run much slower than expected, due to the extra packet loss incurred (see TCP Performance and the Mathis Equation). Assuming that some of the traffic between the sites is Real-Time Protocol packets (UDP encapsulation) carrying voice or video, TCP will experience even more congestion. Do you design the voice/video QoS to handle the normal bandwidth case, or design it to handle the case when a failure occurs? It depends on the importance of the different QoS traffic types when creating the design.

In some cases, the business can operate with lower application performance for a short period of time, provided that the failure is quickly repaired. In other cases, the business must monitor the links and make sure that each link is running at less than 47Mbps (95Mbps total) so that application performance is not degraded when a failure occurs. (The reason for less than 100% is to allow for overhead on the link, such as routing protocol packets, CDP/LLDP packets, inter-frame gaps, etc.) Financial firms typically fall into this category. Retail companies who rely on business applications for revenue generation should also fall into this category.

But the non-technical manager sometimes doesn't understand the need for the reserve capacity and puts the company at risk. The network is redundant, but not resilient, because the level of operation is not sufficient for acceptable business operation when a failure occurs.

Too Much Redundancy
Some organizations go overboard with redundancy. Figure 2 below shows a network design that was a hub and spoke design, but with what I'll call "wheel" connections between the spoke sites. The hub was where the main business data center was located.

Figure 2. Hub, spoke, and wheel with insufficient bandwidth

Each spoke link was designed to handle the traffic from that remote site. However, when a failure occurs on one spoke link, the traffic from that site was sent to a neighboring site to be forwarded to the hub site. During busy parts of the day, a link could become overloaded and traffic would be re-routed to a neighbor, which would subsequently become overloaded. The network traffic was oscillating as the load varied at each spoke site.

This was not a very resilient network. They had asked for a network assessment because of application performance problems. The network management systems had not detected this problem, but once it was discovered, they could see the patterns in the historical performance data.

There is another problem with this design: the routing protocol has to process many alternate paths to determine the lowest cost path. More router memory and CPU are consumed by the routing protocol on all routers, unless special configurations are implemented to limit the number of redundant paths that are considered during the routing convergence calculation.

Finally, there are no equal-cost parallel paths. When a failure occurs, a routing recalculation must occur to find an alternate path. This increases the routing protocol convergence time, which impacts the time to route around failures as well as the time to recover from failures, resulting in poor resilience. You would run the risk that a network failure could cause an outage long enough that it results in dropped voice or video sessions.

It is better to design a network with two links to each remote site, and then monitor these links for problems. Supporting two links to each spoke site is more expensive than one link to each such site, so that must be taken into account. An alternative solution would be to eliminate the "wheel" link between every-other site (i.e., halving the number of such "wheel" links in the overall network), then sizing the spoke links to handle the load from a pair of sites. There still isn't an equal-cost multipath from a spoke to the hub and back, but it does reduce the number of paths that the routing protocol must handle.

Designing a resilient network is more than just adding redundancy. It is critical to understand the business needs, and then incorporate the level of redundancy that is required to create a resilient network. This is a case where too much of a good thing is bad.



June 20, 2018

Your enterprise may have adopted SIP Trunks, but are you up to date on how the latest technology is driving evolution in approaches?

In this webinar, youll learn how the new generation of SI

June 6, 2018

The two largest strategic vendors-Cisco and Microsoft-have been busy making changes to their enterprise communications roadmaps, incorporating collaboration applications as fundamental components.

May 23, 2018

If Cisco is a critical strategic supplier for you, then its vital for you to understand the most recent announcements around Webex Teams, and how to build your migration path for the next generatio

March 12, 2018
An effective E-911 implementation doesn't just happen; it takes a solid strategy. Tune in for tips from IT expert Irwin Lazar, of Nemertes Research.
March 9, 2018
IT consultant Steve Leaden lays out the whys and how-tos of getting the green light for your convergence strategy.
March 7, 2018
In advance of his speech tech tutorial at EC18, communications analyst Jon Arnold explores what voice means in a post-PBX world.
February 28, 2018
Voice engagement isn't about a simple phone call any longer, but rather a conversational experience that crosses from one channel to the next, as Daniel Hong, a VP and research director with Forrester....
February 16, 2018
What trends and technologies should you be up on for your contact center? Sheila McGee-Smith, Contact Center & Customer Experience track chair for Enterprise Connect 2018, gives us the lowdown.
February 9, 2018
Melanie Turek, VP of connected work research at Frost & Sullivan, walks us through key components -- and sticking points -- of customer-oriented digital transformation projects.
February 2, 2018
UC consultant Marty Parker has crunched lots of numbers evaluating UC options; tune in for what he's learned and tips for your own analysis.
January 26, 2018
Don't miss out on the fun! Organizer Alan Quayle shares details of his pre-Enterprise Connect hackathon, TADHack-mini '18, showcasing programmable communications.
December 20, 2017
Kevin Kieller, partner with enableUC, provides advice on how to move forward with your Skype for Business and Teams deployments.
December 20, 2017
Zeus Kerravala, principal analyst with ZK Research, shares his perspective on artificial intelligence and the future of team collaboration.
December 20, 2017
Delanda Coleman, Microsoft senior marketing manager, explains the Teams vision and shares use case examples.
November 30, 2017
With a ruling on the FCC's proposed order to dismantle the Open Internet Order expected this month, communications technology attorney Martha Buyer walks us through what's at stake.
October 23, 2017
Wondering which Office 365 collaboration tool to use when? Get quick pointers from CBT Nuggets instructor Simona Millham.
September 22, 2017
In this podcast, we explore the future of work with Robert Brown, AVP of the Cognizant Center for the Future of Work, who helps us answer the question, "What do we do when machines do everything?"
September 8, 2017
Greg Collins, a technology analyst and strategist with Exact Ventures, delivers a status report on 5G implementation plans and tells enterprises why they shouldn't wait to move ahead on potential use ....
August 25, 2017
Find out what business considerations are driving the SIP trunking market today, and learn a bit about how satisfied enterprises are with their providers. We talk with John Malone, president of The Ea....
August 16, 2017
World Vision U.S. is finding lots of goodness in RingCentral's cloud communications service, but as Randy Boyd, infrastructure architect at the global humanitarian nonprofit, tells us, he and his team....
August 11, 2017
Alicia Gee, director of unified communications at Sutter Physician Services, oversees the technical team supporting a 1,000-agent contact center running on Genesys PureConnect. She catches us up on th....
August 4, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, has lately been working on integrating enterprise communications into Internet of Things ecosystems. He shares examples and off....
July 27, 2017
Industry watcher Elka Popova, a Frost & Sullivan program director, shares her perspective on this acquisition, discussing Mitel's market positioning, why the move makes sense, and more.
July 14, 2017
Lantre Barr, founder and CEO of Blacc Spot Media, urges any enterprise that's been on the fence about integrating real-time communications into business workflows to jump off and get started. Tune and....
June 28, 2017
Communications expert Tsahi Levent-Levi, author of the popular blog, keeps a running tally and comprehensive overview of communications platform-as-a-service offerings in his "Choosing a W....
June 9, 2017
If you think telecom expense management applies to nothing more than business phone lines, think again. Hyoun Park, founder and principal investigator with technology advisory Amalgam Insights, tells ....
June 2, 2017
Enterprises strategizing on mobility today, including for internal collaboration, don't have the luxury of learning as they go. Tony Rizzo, enterprise mobility specialist with Blue Hill Research, expl....
May 24, 2017
Mark Winther, head of IDC's global telecom consulting practice, gives us his take on how CPaaS providers evolve beyond the basic building blocks and address maturing enterprise needs.
May 18, 2017
Diane Myers, senior research director at IHS Markit, walks us through her 2017 UC-as-a-service report... and shares what might be to come in 2018.
April 28, 2017
Change isn't easy, but it is necessary. Tune in for advice and perspective from Zeus Kerravala, co-author of a "Digital Transformation for Dummies" special edition.
April 20, 2017
Robin Gareiss, president of Nemertes Research, shares insight gleaned from the firm's 12th annual UCC Total Cost of Operations study.
March 23, 2017
Tim Banting, of Current Analysis, gives us a peek into what the next three years will bring in advance of his Enterprise Connect session exploring the question: Will there be a new model for enterpris....
March 15, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, discusses the evolving role of the all-important session border controller.
March 9, 2017
Organizer Alan Quayle gives us the lowdown on programmable communications and all you need to know about participating in this pre-Enterprise Connect hackathon.
March 3, 2017
From protecting against new vulnerabilities to keeping security assessments up to date, security consultant Mark Collier shares tips on how best to protect your UC systems.
February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.