Disaster Recovery Duds
If you don't solicit input from non-technical people, your IT disaster plan could wind up as ineffectual as these companies'.
No matter how smart the technologist, there always seems to be a forgotten element or an assumption that does not work when responding to a disaster. The disaster planning stories I have collected demonstrate that the best laid plans are not necessarily good enough.
The problems stem from assumptions made in the planning process, or from the exclusion of non-technical personnel from the planning process. I have encountered remarks from non-technical people involved in the disaster planning effort that, when considered, are insightful and right on the mark.
A few of the following situations may seem funny, even ridiculous. Remember they are real occurrences, not from a joke writer. I hope these will stimulate you to brainstorm your planning process with a wide number of personnel, not just the technologists. If you have some stories to tell as well, please comment here.
We Backed Up What?
One of the smaller telephone companies established a network operations center for their infrastructure transmission facilities. One day they discovered that there was no backup power available at the site. No disaster had happened yet; this was an exercise in planning ahead. The backup power was installed and demonstrated successfully.
Eventually there was a power failure, but the equipment that was to be powered did not work. The electrical outlets were color coded to show where that backup power was connected. Unfortunately they got the color codes reversed. The only thing that worked after the power failed was the Christmas tree.
It Did Work for a Minute
An enterprise installed a backup power generator. It was sized correctly. Testing of the generator worked as planned. The enterprise felt very comfortable that in the event of a power outage, the generator system would support their environment for days.
When the power outage finally occurred, the generator started and worked for about one minute, then stopped. It could not be restarted. The enterprise discovered that in not exercising the generator configuration, rust had accumulated in the fuel tank. Once the rust encountered the fuel filter, the filter became completely clogged. It took hours to determine the problem and resolve it. The lesson: Always, always exercise the backup facility at least an hour once a month to ensure successful backup operation.
Exercising the Disaster Recovery Plan
An organization developed and implemented a disaster recovery plan. They wanted to try out the operation to see if they were successful and thorough in their plans and implementation. So on a Monday they caused a failure. Everything worked as planned and there was not a major disruption to the users. Of course it worked because everyone was informed on the previous Friday of the planned failure. The users all downloaded what they needed into their laptops so they would not be bothered. The plan was successful because everyone already had the resources they needed independent of the infrastructure.
Saving the Data
A company that anticipated the effects of the impending Hurricane Katrina concluded that their data center site might be completely destroyed. In anticipation of the hurricane, the company's data files were loaded onto magnetic tapes and given to multiple employees.
Sure enough, the data center was damaged. One magnetic tape was not returned for nine months. The company had assumed that the employee would be locatable after the hurricane and that employee would automatically return the tape. It turned out that this employee was evacuated out of state. The company did not plan for the size of the hurricane disaster and had not created an employee contact and location plan.
Above the Flood
An organization knew there was a possibility that the first floor IT network closets might be flooded under extreme conditions. So all of the equipment racks were filled with equipment above flood level, leaving the first three feet of rack space empty. But no one had talked to the electricians. All of the electrical connections were installed at floor level, well below the flood level. In a flood, the equipment was protected but the electrical connections would be damaged.
The Almost Complete Plan
A financial institution developed a comprehensive disaster recovery plan. Unfortunately, six months after the plan was finalized, the building burned down. The employees had alternate working locations at temporary business centers, hotels and at home. They all had a place to work. What was missing from the plan was an employee directory and location information. Employees had to call each other at home to create their own directory. It took about two weeks to complete directory and location files.
Where are the Emergency Phones?
A manufacturer installed a VoIP system with PoE. The cost of UPS was beyond their budget. Instead, the emergency communications plan called for the installation of Centrex phones to be located in all the conference rooms. When there was a power failure, the conference room phones could be used.
Finally a power failure occurred. The conference rooms had no windows and no one had installed backup lighting. With the lights out, the employees had to hunt around for flashlights to locate the Centrex phones in the dark.
It Worked Before
A data center with many computers on multiple floors was designed with a diesel generator system for backup power. The system was tested successfully for three years.
Finally the power was lost. The generator would not start. The generators on the first floor depended on fuel tanks in the basement. Without electrical power, the fuel pumps did not work. Fuel cans had to be carried to the first floor to start the generator--to start the fuel pumps--to keep the generator working.
Thinking about the possibility of a disaster shows that recovering the IT operation is much more than technical planning and implementation. There will always be some flaw or assumption that may prevent a successful recovery. So what can you do? Get the non-technical and business oriented personnel in the organization not only to participate, but be encouraged to criticize any disaster recovery plans.