No Jitter is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Fives Nines and Acts of God

The five-nines discussion goes on, but is this metric really deliverable? The enterprise can only attain five-nines availability when the network and its operations are better than five-nines in delivery. I found that there are several events that are not counted by most fives-nines designers. These include equipment upgrade time, software changes, software bugs, malicious behavior, and power failures. What is disturbing is that unreliable networks can be caused by poor carrier/provider design and a lack of sufficient resiliency when a major emergency event occurs that seriously threatens the delivery of five-nines.

I live in northern Virginia. I and over a million others in the Washington D.C. region lived through a Derecho storm that struck the area about 10:30 PM on June 29, 2012. The storm moved through our area quickly, in about 3 hours, but caused significant damage and interruption of communications services. I lost power for a few minutes but my FiOS Internet, TV and business lines failed as well as my home POTS line. Others in my neighborhood were without power for days.

A Derecho can produce destruction similar to that of tornadoes. By definition, "if the wind damage swath extends more than 240 miles and includes wind gusts of at least 58 mph or greater along most of its length, then the event may be classified as a Derecho", according to NOAA, the National Oceanic and Atmospheric Administration .

I read the "Comments of Fairfax County, VA" to the FCC August 17, 2012 PS Docket No. 11-60 document and learned of the many communications problems. According to the filing, those problems especially related to 911 calls were not due to PSAP failures, but were due to poor resiliency by Verizon, the carrier for most of northern Virginia.

There are two key central offices in northern Virginia, one in Arlington where I live and one in Fairfax, a county of about 1 million people. These two central offices failed to fully restore full 911 service for about 3 1/2 days. Partial service was restored in about 7 hours after the storm passed.

911 calls entered Verizon's system but were not routed to the County during the first 29 hours of the Derecho event. How many other calls were not routed properly after the 29-hour period has not been reported by Verizon. This was the longest outage experienced by Fairfax County since it implemented 911 service in 1988.

I had FiOS Internet and TV back about a day later. The phone services--FiOS and wireline--did not operate at all. Eventually I was able to hear dial tone on the FiOS and wireline connections but every time I dialed, nothing happened. I learned later that no one could call me on the business or home phone lines. It appeared that there were significant problems at the central offices. My cell phone worked to some degree, but not completely.

One of the problems was the ambiguous communications between Verizon and the 911 call centers. According to the FCC document, "At 6:55 a.m., Verizon sent a cryptic email to designated Fairfax County staff saying that the Arlington central office was without power or backup battery/generator. The references to Arlington suggested that 9-1-1 service was affected only in Arlington County. Without a corresponding phone call explaining the situation and the email, Fairfax County's PSAP staff continued with their normal operations, unaware that incoming 9-1-1 call service from Verizon was about to rapidly deteriorate." In a future emergency event, this lack of communication could easily be the situation between a carrier and an enterprise as well.

In Maryland, the 911 tariff states that the PSAP is to notify Verizon of problems--not Version notifying the PSAP--a really questionable arrangement. This tells us that enterprises need to take the initiative to determine who is responsible for problem notification with their own carriers and maintain a current contact information list. You may be surprised about who informs whom and how accurate the notification is.

On Saturday at 3 PM, the Fairfax PASP received a call that Verizon restored minimal and sporadic 9-1-1 service, but that the Automatic Location Identification (ALI) data would not be transmitted to the PSAP due to failed Verizon equipment that was still under repair, no backup.

The ALI is the automatic display at the PSAP of the caller's telephone number, the address/location of the telephone and additional emergency services information.

Next Page: Remediation

Commercial power loss is common during heavy storms and Verizon had installed systems to deal with this loss. The storm cut off commercial power to the Arlington central office on June 29, shortly after the storm began. The Arlington systems automatically switched to backup battery power.

The battery systems are designed to provide power until two electrical generators come on line. Both generators are needed at the Arlington site to carry the power load, but one of the generators failed to start. The second generator started, but without its "twin" it became overloaded and shut down. The batteries had to carry the full power load, and supplied sufficient power until they were depleted in approximately six hours.

Then the Verizon call transport network in the Arlington Hub, which facilitates and directs communications between many Verizon end offices/central offices throughout Northern Virginia stopped operating. Power equipment in Verizon's Fairfax central office also failed, isolating the Fairfax E911 tandem switch and preventing the routing of 911 calls to the Fairfax County PSAP. The Alexandria E911 tandem switch, a secondary route for 911 call transport, remained operational, but the capability to route 911 calls to the Fairfax County PSAP failed.

Verizon could not diagnose the magnitude of the tandem switch failure in Verizon's Fairfax central office. The power failure in the Arlington central office affected telemetry equipment monitoring the network in the Arlington office. Without the access to the telemetry data, Verizon was flying blind. No backup or alternate communications was available for the telemetry network.

With these experiences, you would expect that Verizon would be adhering to best practices identified by the Network Reliability and Interoperability Council ("NRIC"). The NRIC partners with the FCC, the communications industry and public safety organizations to facilitate enhancement of emergency communications networks, homeland security, and best practices across the telecommunications industry. The NRIC best practices cover a wide range from legacy through mobile communications. To ensure that Verizon knows about the best practices, the FCC enumerates them as an appendix to the report.

Conclusion
Although much of this discussion relates to 911 service, the problems mentioned can also affect enterprise communications networks. The goal of 99.999% availability cannot be met during emergency conditions if the carriers are not prepared to respond to their problems quickly and decisively to ensure that problems are resolved. The flaws in Verizon's preparedness to emergencies affect the enterprise.

The FCC report recommended several sensible improvements for Verizon to adopt. Below are the four recommendations I thought were most important.

1. Provide On-Site Verizon Representative at Emergency Operations Centers. If this had been done, the telemetry failure would have been discovered and mitigated faster

2. Perform Drills to Simulate 9-1-1 Outage Contingencies. If this were done then the weaknesses of Verizon's' facilities would have been discovered BEFORE an emergency.

3. Provide an Active Notification System for Incidents, with detailed and complete information

4. Provide Monthly Updates to Key Contact Lists at 911 centers. This would also apply to communications management at the enterprise.

Verizon's response to the 911 problems is contained in "Verizon, 911 Service and the June 29, 2012, Derecho".