According to a recent Uptime Institute survey, “
How to avoid outages: Try harder!” 70-75% of data center failures are caused by human error, producing a chain effect of downtime. Additionally, more than 30% of IT services and data center operators experience downtime or severe degradation of service. 10% of the survey respondents reported that their most recent incident cost them more than $1 million. But how preventable are these outages?
For more about human network errors and intent-based networking (IBN), I interviewed Mansour Karam, founder, and president at
Apstra, an IBN software company, to gain more insight.
GA: How much does human error contribute to failed network operations?
MK: We’ve all heard of catastrophic network outages that make the news and have devastating business consequences. It turns out that each one was caused by human error - a single network operator entering the wrong command on some network device.
That isn’t surprising - indeed, networks are complex distributed systems, and network protocols rely on each device getting configured in a specific way. Any departure from these configurations causes these protocols to malfunction, which results in traffic to drop and the network to stop operating as intended with devastating consequences to applications. Software is far smarter and faster than humans are at managing large distributed systems - i.e., networks that organizations rely on to run their businesses.
GA: Can you define IBN? How does automation play into this?
MK: IBN amounts to software that delivers powerful automation of the network infrastructure. The basic premise is that the network operator describes what their intent from the network is - e.g., which workload or server connects to which other entity, what security rules must exist, what compliance policies need to be implemented, and what performance metrics need to be met.
The software is then responsible for delivering on this intent. It does so by translating intent into a set of precise configurations that it pushes to the various devices in the network. The software also collects telemetry from the network continuously and runs the telemetry through a battery of tests continuously and in real-time (using intent-based analytics, which Apstra also pioneered) to validate that the network is satisfying the intent.
In an IBN system, all intent and state of the network collected through this telemetry reside in a distributed data store, which constitutes a single source of truth. If the IBN system detects, through these tests, that the network is failing to meet one aspect of intent, it then notifies the network operator and presents them with a workflow to remediate the problem. As operators get more comfortable with the software, they can allow the system to auto-remediate deviations from intent for a true self-driving network experience.
GA: How can downtime be reduced? Can IBN shorten the mean time to repair (MTTR)?
MK: An IBN system is continuously running tests. When something fails, it will detect an error immediately and pinpoint to the user the root cause for this failure. Without an IBN system, the network operator is either running blind or has to sift through a sea of data collected in data lakes and presented over a large number of screens. When failures do occur, this shows a dashboard sea of red, which isn’t helpful to pinpoint the root cause of the issue.
An IBN is self-documenting and records every change over the entire history. It provides the ability for the network engineer to go back in time, to the network intent from two days ago, and fast-forward from then to how it was yesterday. This unique capability of an IBN system is called “Intent Time Voyager.” It enables the network engineer to roll back to a previous state of the network across the entire network, including devices from disparate vendors.
Imagine a scenario where a network problem was to occur. Most likely, the IBN system will detect the problem and auto-remediate the solution. This solution dramatically improves MTTR. For the rare case where the IBN system is unable to determine the root cause the problem, then the operator can roll back to a version of the network that doesn’t present the problem. That may resolve the issue and allows the engineer time to debug without causing a business-impacting outage.
GA: Is there a return on investment (ROI)? Will this reduce total cost of ownership (TCO)?
MK: Networking teams spend three to five dollars of work time operating one dollar of networking equipment that they’ve purchased. Those organizations that have deployed an IBN system have witnessed operating expense (OPEX) reductions as high as 83%, with drastic reduction to the TCO. An IBN system frees organizations from vendor lock-in and enables them to use the network equipment of their choice. This flexibility provides their procurement teams leverage to negotiate better pricing for their hardware costs, which directly affect capital expense (CAPEX) costs.
GA: How does IBN impact the IT staff?
MK: IBN empowers the networking staff to do more with less. It empowers them with a solution that allows them to operate larger, critical, and more complex network infrastructures without increasing the network staff.
IBN also enables network architects and operators to work together. An IBN system automates the entire lifecycle of network operations—from day one and onward—and is as relevant to architects and operators. With IBN, both use the same approach. In the case of the network architect, they’ll use IBN to build the network the first time. Because of IBN’s closed-loop validation and a single source of truth, they can assemble one with operations in mind.
When the architects deliver the network to the operators, the operators will then make changes in this same network using the IBN system. This delivery is done by describing their intent and allowing the software to deliver on it, and continuously measure the network to ensure this gets met. The software will audit every change, ensuring that both operators and architects know exactly what outcomes the network delivers at any moment in time.
By delivering this unified toolset for network architects and operators, you avoid the common problem of having a greenfield network become a brownfield network overnight, and then legacy within a few months.
Karam believes that there are several advantages to implementing an IBN. It encourages network architects and operators to work together, and allows them to do less with more.