Continued from Deep Dive: Cloud Service Availability Part 1
What is the impact of cloud on availability?
With cloud adoption rising, many organizations see it as a longer-term solution to providing a home for their infrastructure and data, primarily to reduce the cost of in-house IT, but also to improve service. So what effect does this have on availability?
In some ways cloud infrastructure can improve availability. In the event of a server failure, the service is restarted on another available server. Multiple copies of the data ensure that a storage failure does not result in downtime or loss of data. Some service providers provide snapshots or backups of data allowing end-users to rollback to previous versions protecting against data corruption.
In other ways availability and the ability to meet the businesses defined SLAs for certain applications can degrade. This is mainly due to the fact that there are more infrastructure components, such as internet connectivity or the cloud provider’s servers and maintenance schedules, which are outside the organization’s direct control and could experience a failure.
This can lead to multiple parties, such as end-user application, IT and network teams, the cloud infrastructure provider, Internet service provider (ISP), network hardware vendors, etc. troubleshooting an outage, which becomes complex and requires careful coordination.
In addition to this, cloud providers and ISPs define exclusion clauses in their SLAs to protect themselves from penalty payments or, in extreme cases, avoid legal prosecution. Many of these clauses relate to:
- Internet access problems that are outside the cloud provider’s control
- Force majeure – acts of god, e.g. earthquakes, hurricanes, flooding, volcanic eruptions, war, etc.
- Failure of 3rd party equipment
However, these don’t help an organization when their applications or data is unavailable.
For organizations with multiple remote sites these issues are magnified as some remote sites may only have access to poor Internet connectivity (low bandwidth/high latencies) leading to poor application performance, ultimately resulting in a poor customer experience. To provide the necessary internet connectivity may mean additional costs to upgrade the network infrastructure at each site, however for some sites this is not even an option, due to the remoteness of the location.
What should you do before moving to the cloud?
Before moving workloads and services to the cloud, an organization should determine which applications can be located in the cloud. They should:
Review all SLAs or implement ones for each application/service they provide, to identify potential candidates. It should be pointed out that many of the cloud service providers only provide 99.95% availability per month, equating to 21.56 minutes of downtime per month. This may not be sufficient for some organizations requiring 5-nines (99.999%) or higher availability for some applications.
Produce an application-dependency tree to identify the inter-relationships between applications, as moving one application to the cloud could have major performance implications for others remaining onsite. This enables the organization to further refine, which applications can and cannot be located in the cloud.
Carefully examine the SLAs for the cloud providers to ensure that they can meet the needs of the business. For example, what a cloud infrastructure provider defines as an outage may radically differ from what the end-user thinks is an outage. Typically the SLAs are used to trigger service credits or penalty payments, when the service falls below the agreed SLA. To avoid giving credits or paying penalty payments, the providers add exclusion clauses to the SLAs.
Ensure that the network connectivity between the sites and cloud provider are sufficient to meet the business availability requirements. For some sites in rural/remote/isolated locations or deployed in less developed countries this may not be possible, making the WAN connection the weak link in the infrastructure. Ideally there should be multiple diverse network links (from different providers) to avoid single points of failure Each link should be capable of delivering the required throughput (bandwidth), speed (latency) and reliability (packet loss) to ensure the service remains operational, with minimal or no impact to application performance.
Define an exit strategy, detailing what actions are required to move a service out of the cloud and bring it back in-house or transfer it to another provider.
What does this mean for distributed enterprise?
For distributed enterprises with one or 1000s of remote locations requiring higher availability levels than the ones offered by the cloud providers or who do not want to fully rely on their WAN/Internet connectivity to access the applications or data from the cloud, it would be recommended that these organizations consider maintaining onsite IT infrastructure. Using StorMagic SvSAN they can deliver a cost effective, small footprint solution that provides the high availability they need.