Getting Cloud Credit
Service failures beg the questions, “What does my service level agreement cover?” “Do I receive an account credit?” “Is it worth the time to submit a claim?” “What is the credit worth?”
Subscribing to cloud services is very popular, for several reasons:
• The enterprise moves from CAPEX to OPEX financing.
• New services/features/functions become available.
• Someone else has to manage the IT service.
But there are, of course, risks involved with moving to a cloud. The most recent illustration of this was when Microsoft's cloud-based Office 365 service suffered an extended outage.
This recent failure prompted the questions, "What does my service level agreement cover?" "Do I receive an account credit?" "Is it worth the time to submit a claim?" "What is the credit worth?" Although the Microsoft failure is the impetus for writing this blog, the same concerns relate to any cloud-based service.
Microsoft Service Level Agreement (SLA)
Lync Online is covered by "The Service Level Agreement for Microsoft Online Services" (available here: OnlineSvcsConsolidatedSLA(WW)(English)(July2014))
The loss of Lync Online is defined by Microsoft in their service level agreement as "Any period of time when end users are unable to see presence status, conduct instant messaging conversations, or initiate online meetings."
When a service fails, then the provider's definition of downtime becomes important. Measuring downtime by the provider may not include all events that the customer considers as downtime, events like scheduled downtime when the customer cannot use the service. The Microsoft definition of downtime; "Downtime means a period during which the aspects of a Service specified in the following table are unavailable, excluding (i) Scheduled Downtime; and (ii) unavailability of a Service due to limitations described in Section 5(a) below. Downtime is measured in the units set forth in Section 3." These units are measured in minutes over a period of one month. (Refer to the linked document above for more information.)
Measuring Availability by Microsoft
The agreement has a metric "Monthly Uptime Percentage," which is used to calculate the percentage of uptime. The common measure for availability is:
In the case of the Microsoft SLA, the uptime and downtime are measured in minutes per month.
Downtime is calculated in minutes over a one month period multiplied by the number of users impacted by the failure. If the Lync Online service is experiencing less than 99.9% uptime, then the enterprise is eligible for a credit.
Calculating the Credit
Microsoft has three levels of credit according to the table below.
Delivering 99.9% availability looks good until you read the exceptions that are not included in the calculation. It does not include the network access, none of the enterprise infrastructure, conditions outside the control of Microsoft (e.g., natural disaster, war, terrorism, government action...). When you add in all of the exceptions, the enterprise does not have a service with 99.9% availability. This is not a criticism; it is an observation of the real availability to the user.
The recent service failure was reported to be an interruption of about 9 hours. Using a month of 30 days X 24 hours X 60 minutes, then the operational time is 43,200 minutes in a month. A nine hour failure is 540 minutes long. By this calculation, the customer received 99% availability and can apply for a credit of only 25% – not enough to cover the costs the customer may incur collecting the claim information.
To get a 100% credit, the service must be out more than 36 hours, one and a half days, a disaster for most businesses. There is no way the 100% credit would come close to compensating the business for its incurred costs because of the service failure.
What looks really bad is the 100% credit. The enterprise has to experience greater than a 5% loss of service over a month, an intolerable condition for most operations, before it is eligible for a full refund. This also means that with a loss of 5%, only a 50% credit is awarded - pretty poor service delivery. Who would subscribe to such a service? Many have. Think of the cost to the enterprise if this were to occur.
Claiming a Credit
In hosted services, the service provider may, at their discretion, automatically offer a credit. This has happened occasionally, but it is much more likely that enterprise IT staff has to collect data and apply for a credit.
For a claim with Office 365, IT staff have to provide:
• A detailed description of the failure
• The length of time there was a service outage
• What locations were affected
• How many users were affected
• A description of the enterprise attempts to resolve the failure
The claim for credit has a limited time window. The claim must be submitted by the end of the month following the failure, and you may have to wait 45 days before you will be notified of the credit. The credit will be applied to future monthly service fees.
It is hard to determine if filing a claim is worth the effort. Producing the claim is a distraction for the IT staff. The real question is whether the IT staff labor for filing claim is more than offset by the credit received? To know the answer, the IT staff may have to complete the claim anyway to see if it is worth the effort. This means the enterprise will still expend the IT staff labor whether or not a claim is filed.
Some IT executives may not want to expend the effort, especially when the IT staff is assigned to high-value projects. The CIO may set a policy of beginning a claim process when an outage exceeds 1 or 2 hours. Otherwise it may not be worth the effort.
If no claim is filed, then what is the value of a credit? I suggest that Microsoft customers claim as much as possible. More claims for greater credits may stimulate Microsoft to deliver better service to reduce the cost of claim processing as well as the amount of claim rewards. Downtime is also a hit to Microsoft's reputation.