Fives Nines and Acts of God
The Derecho storm and outages in Virginia demonstrate the need for enterprises to understand the carriers' systems and potential shortcomings.
The five-nines discussion goes on, but is this metric really deliverable? The enterprise can only attain five-nines availability when the network and its operations are better than five-nines in delivery. I found that there are several events that are not counted by most fives-nines designers. These include equipment upgrade time, software changes, software bugs, malicious behavior, and power failures. What is disturbing is that unreliable networks can be caused by poor carrier/provider design and a lack of sufficient resiliency when a major emergency event occurs that seriously threatens the delivery of five-nines.
I live in northern Virginia. I and over a million others in the Washington D.C. region lived through a Derecho storm that struck the area about 10:30 PM on June 29, 2012. The storm moved through our area quickly, in about 3 hours, but caused significant damage and interruption of communications services. I lost power for a few minutes but my FiOS Internet, TV and business lines failed as well as my home POTS line. Others in my neighborhood were without power for days.
A Derecho can produce destruction similar to that of tornadoes. By definition, "if the wind damage swath extends more than 240 miles and includes wind gusts of at least 58 mph or greater along most of its length, then the event may be classified as a Derecho", according to NOAA, the National Oceanic and Atmospheric Administration .
I read the "Comments of Fairfax County, VA" to the FCC August 17, 2012 PS Docket No. 11-60 document and learned of the many communications problems. According to the filing, those problems especially related to 911 calls were not due to PSAP failures, but were due to poor resiliency by Verizon, the carrier for most of northern Virginia.
There are two key central offices in northern Virginia, one in Arlington where I live and one in Fairfax, a county of about 1 million people. These two central offices failed to fully restore full 911 service for about 3 1/2 days. Partial service was restored in about 7 hours after the storm passed.
911 calls entered Verizon's system but were not routed to the County during the first 29 hours of the Derecho event. How many other calls were not routed properly after the 29-hour period has not been reported by Verizon. This was the longest outage experienced by Fairfax County since it implemented 911 service in 1988.
I had FiOS Internet and TV back about a day later. The phone services--FiOS and wireline--did not operate at all. Eventually I was able to hear dial tone on the FiOS and wireline connections but every time I dialed, nothing happened. I learned later that no one could call me on the business or home phone lines. It appeared that there were significant problems at the central offices. My cell phone worked to some degree, but not completely.
One of the problems was the ambiguous communications between Verizon and the 911 call centers. According to the FCC document, "At 6:55 a.m., Verizon sent a cryptic email to designated Fairfax County staff saying that the Arlington central office was without power or backup battery/generator. The references to Arlington suggested that 9-1-1 service was affected only in Arlington County. Without a corresponding phone call explaining the situation and the email, Fairfax County's PSAP staff continued with their normal operations, unaware that incoming 9-1-1 call service from Verizon was about to rapidly deteriorate." In a future emergency event, this lack of communication could easily be the situation between a carrier and an enterprise as well.
In Maryland, the 911 tariff states that the PSAP is to notify Verizon of problems--not Version notifying the PSAP--a really questionable arrangement. This tells us that enterprises need to take the initiative to determine who is responsible for problem notification with their own carriers and maintain a current contact information list. You may be surprised about who informs whom and how accurate the notification is.
On Saturday at 3 PM, the Fairfax PASP received a call that Verizon restored minimal and sporadic 9-1-1 service, but that the Automatic Location Identification (ALI) data would not be transmitted to the PSAP due to failed Verizon equipment that was still under repair, no backup.
The ALI is the automatic display at the PSAP of the caller's telephone number, the address/location of the telephone and additional emergency services information.
Next Page: Remediation