8 More Disaster Recovery Duds
The lesson: Test your backup systems, get management buy-in, and seek input from non-technical people to find the gaps in your plans.
Planning for business continuity is planning for interruptions to business operations. There will be disasters, foreseen and unforeseen. The unforeseen and neglected problems are what I want to discuss.
In my previous blog, Disaster Recovery Duds, I presented 8 disaster recovery duds that generated a lot of interest. So I have 8 more stories of disaster recovery planning and execution that did not go as planned.
When Did We Last Try This?--An enterprise installed a backup power generator. It was sized properly for the data center power requirements. The generator was tested and worked as planned. Everyone was very comfortable with the backup power arrangement.
Then it happened. The power was cut off. The generator started and provided power for about one minute, then stopped. It would not restart. It took quite a while to discover that the fuel filter was clogged with rust. The generator had not been exercised periodically (once a month for an hour would be best). The fuel tank had rusted and caused the filter to block the fuel flow.
Always, always exercise your backup facilities to ensure that they are working properly. Don't wait for the disaster to learn how well the backup works.
The UPS Will Fit into the Closet--As network closets become a power source with PoE, then the UPS battery backup also moves to the network closet. An enterprise budgeted properly for the closet UPS, but the network closet did not have enough floor space to allow the installation of the UPS batteries. So one of the IT staffed decided to locate the batteries vertically along one wall. The facilities people found out and immediately stopped the UPS installation. The UPS, when installed vertically, exceeded the weight limitations for the floor. Before the UPS could be installed, the floors below the closet had to be reinforced to hold the weight of the batteries. This worked; however the extra cost for the reinforcement caused the UPS budget to be far exceeded.
The Vent is Where?--A diesel generator was installed successfully as a backup power source. As the generator was being tested, someone walked into the closed shipping and receiving garage. That person discovered that the generator was vented into the garage, not outside. The exhaust design had to be changed, which increased the budget.
The Data Center was Safe--A new data center was constructed in an existing office building. The planners anticipated potential building closure due to scenarios such as bad weather, protesters, and planned for employees staying on site for days; they planned for backup power, but not a thought about the building. The data center was the hub for 30 remote data centers and 14 networks. The data center was above the building loading dock. A single critical pillar on the corner of the loading dock was just below the data center. If a truck damaged or collapsed this pillar, then the data center would be destroyed.
The Data Center Backup Worked Fine--An information services enterprise planned and executed a successful backup power arrangement. It worked as expected. However, it was tested when the primary building power was still on. The backup power provided excellent isolation from the utility company's power which had many short power fluctuations.
Finally, the primary power was lost one evening.
What the backup power designers did not consider was the power to the security scanners for entry into the building. No one could enter the building because the scanners were inoperable. Those who wanted to enter had to call on their cell phones to those inside to get someone to open the door for them to enter.
Remember, there are always other power vulnerabilities. Test the backup power under real conditions to locate the missed vulnerabilities.
Fueling the Generator--A lesson learned during Hurricane Katrina was that roads could be closed for weeks. Diesel generators worked for several days. When the diesel fuel was exhausted, it was learned that the refueling trucks could not get to the generator locations. The assumption that they could be refueled before the generators stopped running was incorrect.
In response, a school system planned their backup power generators so they could run on natural gas through pipelines. It was less likely that the pipelines would be affected by a storm and thus could run indefinitely.
The Disaster Recovery Staff--One of my early projects was the automation of a central bank. It was very important that the operations should continue during a failure even if they were to be manually operated.
So we designed a manual backup system with equipment and staff to support it. We never needed the manual system because the design of the automated system worked under all conditions.
One day I decided that we needed to test the manual backup system. It did not work. And in the meantime, the departments that were responsible for the manual backup staff had gained so much confidence in the automated system that the backup staff was reassigned without notifying the IT department. The equipment was there but not the people. If you create any backup that depends on extra or reassigned staff, make sure they really exist and are prepared for the failure.
The Emergency Host--An organization decided they needed to update their emergency response procedures. They proposed that every building on their campus have an emergency host that would stand outside the building to direct emergency personnel when they arrived. Sounds good until you consider explosions, chemical leaks, earthquakes, and terrorism. The plan assumed that the host would only be needed during office hours. The plan was inexpensive to execute but not really workable. Keep in mind, people may not behave as planned during an emergency unless they are well trained and the emergency procedures are frequently exercised.
There never can be a perfect plan that covers all possibilities. When the plan is written, give copies to as many management level personnel as possible, especially those most affected by a disaster. Non-technical feedback can be useful for locating holes in the plan.