Where's My Cloud?: Page 2 of 3
Continued from previous page
I decided to dig a little deeper, so looked in the Downdetector archive for additional RingCentral outages. I found a total of four outage events documented in 2018 -- on Jan. 22, March 2, March 15, and, as already mentioned, April 3. The graphic below shows the timing and duration of the reporting for each of those outage events (note the vertical and total count of problem reports is different for each event). While Downdetector has no record of how many users/seats these outage events impacted, the number of reports and the duration indicates RingCentral customers would have found these to be significant interruptions.
After gathering the data, I talked with Curtis Peterson, RingCentral's SVP of Cloud Operations, for a detailed explanation of the four 2018 outage events reported on Downdetector. Here's a summary of his account of those outages:
- Jan. 22 -- Outage caused by a peering issue outside of RingCentral control. This issue led to a relatively low number of reports, and those may have been tied to a specific Internet service provider. Clearly the choice of access provider can be critical to cloud availability.
- March 2 -- Result of an East Coast storm that impacted several data centers in the Washington D.C. area, affecting a variety of Internet applications and sites. RingCentral appears to have been affected by Amazon/Equinix issues on that day. The choice of data center vendor can have a clear impact on services.
- March 15 -- Outage related to Internet peering issues that impacted not only RingCentral but many other cloud providers and users. In checking, I didn't find any other major cloud provider outages at the time, so this event may have been primarily with the RingCentral IP peers versus the open Internet. As in choice of data center locations and providers, peering is another critical issue for cloud providers. Even though this event may not be attributable to RingCentral directly, it still impacted some RingCentral users over an eight-hour period. In total, there were fewer than 200 reports of that outage -- and, as Curtis pointed out, new technologies like software-defined WAN have the potential to mitigate this type of outage
- April 3 -- Event due to a data center issue with West Coast servers as morning load ramped up; users moved to East Coast servers
Most of the outage events didn't impact the entire population, but rather a percentage of 30% or less of the total, Curtis told me. While there is no specific data showing such, his conclusion seems logical considering the March 2 storm and the difference in the event reporting visualization of the March 15 issue and the limited scope of CCaaS (with the number of cloud contact center seats typically 5% of overall business communications seats). The April 3 event, resulting from a core UCaaS issue and affecting a large number of West Coast users, had the largest impact.
Based on the charts, we can attribute the maximum impact of the outages as: one hour of average customer unavailability for Jan. 22, two hours each for March 2 and April 3, and three hours for March 15. Added up, the total is eight hours of average reported outage periods for RingCentral customers in the first three months of 2018. This equates to 32 hours for the year, or, in our count of nines, between two and three nines (99.63% availability), if all of the outages impacted a specific customer. However, if we assume that the events on average impacted 30% of the user base, then the actual impact is 99.89% availability, or about three nines. If the outages on average only impacted 10% of the RingCentral user base, the availability increases to 99.69%, or higher than three nines.
While these are extrapolated estimates, if a premises PBX vendor touted availability of three to four nines (only one to nine hours of downtime per year), that vendor would have been challenged to sell many systems against competitors claiming five nines. On the other hand, as Curtis noted, on legacy systems we can't track issues as transparently as we can with cloud-delivered services. Regardless, customers considering a cloud migration need to understand availability of their chosen provider and the impact on their organization.
Continue to next page: Analyzing UCaaS availability challenges overall