Network engineers who learned the command line interface (CLI) for configuring network devices often prefer to make changes using manual processes, citing the speed with which changes can be applied. Is this really the case?
The Manual Change Process
The manual change process isn’t just about typing commands directly into network devices. The smart engineer creates the changes in a text document and uses cut and paste to apply configuration updates. The text document is needed anyway for the change control board (CCB) to review the proposed change. Making network changes without going through the CCB is not a valid process for the purposes of this comparison with automation. A vast majority of companies use the CCB to try to reduce the number of network outages due to configuration changes, which is the most frequent source of outages (up to 80% reported by some analysts).
Most of the manual change processes I’ve seen have omitted the pre-change validation that the network was functioning correctly prior to implementing the change. And frequently the post-change validation is simply checking the output of the show configuration or show run. Instead, pre-change and post-change validation should be checking subsystems that are expected to be impacted by the change, such as routing protocols, connectivity, and neighbor relationships.
The manual process may indeed be faster than an automated process for changes that need to be applied to a small set of devices, particularly if short-cuts are taken. But I maintain that the process of creating the data to drive automation doesn’t take that much more time than a thorough manual process, even for changing a few devices.
Automating the Change Process
Automation offers the opportunity to improve the configuration change process. Using a code repository forces documentation of the proposed changes, and a peer review catches silly mistakes early in the process. Both steps increase the quality of the changes and fewer network outages.
Of course, automation becomes more compelling as the number of devices goes up. For example, configuring the same quality of service (QoS) settings across an entire enterprise with more than a few devices or changing BGP policies on a set of internet border routers. It’s a relief to not have to repeat mind-numbing and error-prone cut-n-past operations across a long list of devices.
Just a warning, though: You should start by automating one change on a subset of devices, validating that the change does what you want and that the pre-change and post-change validation tests are accurate. Once the automation is validated, the number of devices which can be changed via automation can be increased.
Once you've automated certain processes, you can automate pre-change and post-change validation. This is something that often fails with manual processes because the same steps must be repeated for each device: Did a link come up, is the neighboring device the right one, and are the network protocols showing correct operation, is the access control list functioning correctly?
Finally, include a full network validation. This final step validates that the overall network is functioning as desired and that there were no unwelcome side effects. Even better, this comprehensive network validation suite can be used periodically to catch unexpected network problems. Try including these tests in your manual process workflow, and you’ll soon find that automation is faster.
A big advantage of automation comes from the reuse of the workflow elements. The process of configuring a data center pod or a branch site changes from workflow-driven (step-by-step process) to data-driven (a common automation workflow that’s driven by per-pod or per-branch variables).
You should eventually get to the point where a certain set of changes become streamlined and are quickly approved and implemented. Adding a new link or bringing up a new VLAN are good examples. Automation can accomplish these tasks much faster and safer than manual processes.
I can hear objections now: But I don’t know anything about automation! This comes from network engineers who learned an arcane command line interface syntax and complex network protocols. Ansible automation is certainly within the reach of anyone who has learned enough to successfully use the CLI. If that approach isn’t viable, then there are non-programming tools available, like Gluware.
It helps to understand the overall approach to automation so that you do better than simply replicating manual processes. Look for courses on sites like Pluralsight and O’Reilly that provide that level of understanding and reasonable pricing. Another approach is to read automation books like The Phoenix Project, which teaches three basic tenants of DevOps:
- Adopt a flow of work that uses small batch sizes. In networking, it is limiting the scope of changes (known as the blast radius). Start small and expand as the automation is validated that it does what you want. Limit the work in progress to small batches. This allows work to flow from one part of the process to the next.
- Fast feedback uses small loops and faster correction when something’s wrong. Peer reviews of proposed changes early in the cycle are one of the feedback mechanisms. Using small batches supports fast feedback.
- Adopt a culture that fosters education and experimentation in which repetition and practice generates mastery.
You can also get started by simply automating read-only processes. Build a few pre-change and post-change network validation tests. This will get you started in an environment where not everyone is on board with automation or doesn’t have the necessary skills.
Successful adoption of automation depends on a basic culture shift. The entire network team needs to adopt automation workflows and needs to understand how to use the automation systems. Anyone who implements changes manually creates technical debt that needs to be addressed before the automation systems can resume control. It will take some effort to streamline the workflows so that automation is faster than the well-known manual processes, but it is certainly possible.