Time to Re-Think Your Cloud Strategy

Two recent key cloud outages left many customers without voice or Web services. Could such service disruptions be avoided if only the cloud providers implemented better practices? Or should the onus fall on customers, who must have better strategies when adopting cloud solutions?

When Amazon Web Services (AWS) went down in late February, most of the East Coast was impacted, with many unable to access numerous Web services. When AT&T's IP Flex service was recently down, scores of users lost outbound and inbound calling. Both of these disruptions lasted hours, not minutes.

I spoke with a key provider that hosts management systems nationwide and it's changing its offerings to provide backup service outside of the primary service with AWS. I conferenced with another provider and it too is in the process of implementing a backup solution using a different provider.

Businesses that have been allured by the cloud and are less reliant on internal systems and applications may need to rethink their cloud adoption strategies. The old adage, "don't get seduced by the technology," still applies as does not becoming overly reliant on technology.

Enterprise infrastructure must be resilient, and without this consideration network managers can expect service degradation and continued disruption. Environment, power, cabling, switching, routing, Wi-Fi, firewall, local storage/server require hardening. Businesses need to also consider alternative routing and alternate routes as two distinctive considerations, with alternative routing having links on different paths and alternate routes having different providers on the same paths.

But these steps may already describe the environment that was in place at larger enterprises before they made the move to the cloud. So the initial question still remains. The same precautions and contingencies used by network managers need to be applied to cloud solutions as well. Using a single provider and depending on its cloud infrastructure may prove to net the same results, with outages lasting hours not seconds or minutes. These kinds of durations do more than disrupt business; they have the potential to impact public safety. While enterprises will surely focus on losses, they have reason to, since the average loss per hour of downtime in a U.S. data center exceeds $138,000 (and this is an older metric from 2014).

According to the Uptime Institute, here's how the different tiers of reliability break down:

  • Tier 1: Non-redundant capacity components with an availability of ~99.671% and 28.8 hours of downtime per year
  • Tier 2: Tier 1 + redundant capacity components with an availability of ~99.741% and 22 hours of downtime per year
  • Tier 3: Tier 1 + Tier 2 + dual-powered equipment and multiple uplinks with an availability of ~99.982% and 1.6 hours of downtime per year
  • Tier 4: Tier 1 + Tier 2 + Tier 3 + all components fully fault-tolerant including uplinks, storage, chillers, HVAC systems and servers with everything dual-powered and an availability of ~99.995% with 0.4 hours of downtime per year

Before addressing redundancy, asking basic questions of either the local or cloud infrastructure providers about their configurations may lead to answers that inform the business where to make changes to minimize disruption. These options all come down to costs, and financial considerations cannot be addressed without knowing what direct impact any loss of service has on the business. So let me ask you, how much availability can your business afford?

Follow Matt Brunk on Twitter and Google+!
@telecomworx
Matt Brunk on Google+