At this point, we all heard about the six-hour outage of all of Facebook’s services last week. Though some people might have enjoyed watching Facebook fumble a response, the outage highlights a major business issue.
The outage most likely impacted many readers' ability to use the platform to communicate, causing some people short-term anxiety or disrupting plans to meet friends. But the biggest impact was upon Facebook itself. The outage affected many of its revenue-generating platforms, reportedly costing the company $100 million in lost revenue and wiping $40 billion off its share price, as
The Times reported. That is an expensive mistake in anyone’s book. Equally concerning was that it appeared to have taken down all of Facebook’s internal enterprise communications, making it doubly difficult for their engineers to diagnose and fix the issues.
There are some salient lessons here that should influence everyone’s approach and attitude to securing their enterprise communications, in particular:
- If this can happen to Facebook, given all the technical and engineering resources at its disposal, then it could happen to any organization.
- A single failure or issue ( apparently a single wrong command) was able to take out a whole organization on a global basis.
- The failure took out both external revenue-generating services and internal operational services.
I presume Facebook had business-continuity (BC) plans, and you can bet they are reviewing them now; I urge all organizations to do the same.
As an independent consultant, I’ve been helping my clients develop BC plans for many years. Over that period, it has become increasingly difficult to persuade some organizations to make proper provisions for very low likelihood but high-impact events. A common view is that it requires significant investment for something that will probably never happen. Why spend money for no reason? However, BC planning should not be so simplistic. I’ve put together a few pointers that can inform an effective and appropriate review of BC plans, which include:
- Undertake the review from the business perspective, not a technology one — There’s a reason that the terminology is business continuity! BC plans should consider what is necessary to enable your business operations to continue. It’s not necessarily about ensuring all networks and services can be made available at all times.
- Consider non-technical solutions when appropriate — For example, if you can send your contact center agents to an alternative location and route your inbound calls to that location then that may be a more appropriate solution than providing 100% resilience at the original location.
- Understand the characteristics of your cloud solutions — Many organizations think that their cloud provider has much bigger customers than them, and the service is engineered and architected beyond their lowly needs. But maybe those bigger customers have their own robust BC plans and are actually prepared to accept lower SLA and availability metrics than you require. There is a big difference between 99.999% and 99.95% availability. Even if the supplier meets their SLA, can your business survive a four-hour outage of critical services?
- Don’t have all your eggs in one basket — Purchasing all your service from a single vendor comes with its benefits, including a better value and having a single throat to choke. Your vendor will tell you that their solutions are architected to ensure full resilience. But as we saw last week, it is still possible for a vendor to suffer a widespread outage. Consider spreading provision equally between multiple suppliers or at least securing a small capacity from a second source that is sufficient to maintain a minimum level of business support.
- Understand your internal infrastructure — Are you dependent upon a single internal infrastructure (for example, your local area networks)? We have come a long way from the days of separate networks for each application and workload. Consolidating onto a single-network infrastructure enables you to invest once with engineered resilience benefiting all your workloads. But beware of losing all of your communications capability if that fails. No longer are your telephones independent of email and other electronic communications. So, consider maintaining an alternative capability for the critical communications your IT staff will need to tackle bigger issues.
- Have a clear plan in case of an outage — Sometimes, that will be low-tech. For outages, one of my clients has a documented procedure to simply publish a splash screen on their website and share social media messages with information so customers can decide whether their call can wait. Additionally, this client provides an alternative number with the social media message (and a warning on delayed response times), so customers that still need a call can make it during the outage. Hopefully, this procedure will never need to be used, but the client can quickly respond and provide customers with options in case of an outage.
- Document your plan — Finally, ensure you effectively document your plans. Not just the process to implement them but also the justification behind the decisions taken. You might not be around when it needs to be put into action or when someone questions why money is being spent on something that never gets used.
We all hope that we will never need to rely on BC plans in a situation as extensive as that suffered by Facebook. However, the evidence that the unexpected can happen to anyone demonstrates that you can never do too much planning. There is no better time to act on business continuity than now!
The SCTC is the Premier Professional Organization for Independent Consultants. Our consultant members are leaders in the industry, able to provide best of breed professional services in a wide array of technologies. Every consultant member commits annually to a strict Code of Ethics, ensuring they work for the client benefit only and do not receive financial compensation from vendors and service providers.