What is High Availability?
High Availability (HA) describes a system that can sustain long periods of continuous operation, and remain operational, even if a part of the system is no longer in service.
Decreased downtime, the elimination of single points of failure, and replication and distribution of data across multiple locations, are all contributing factors that come together to create a high availability infrastructure.
Availability is the metric that defines the “uptime” of an IT solution and is generally included in a Service Level Agreement (SLA). An ideal solution that was impervious to failure or downtime would receive a 100% availability score. Typically, availability is referred to by the number of nines (9’s) which systems or applications have. See below example of levels of availability and associated downtime.
|Availability (no. of 9s)||Availability (%)||Downtime Per Year||Downtime Per Month|
|1||90%||36.5 Days||72 Hours|
|2||99%||3.65 Days||7.2 Hours|
|3||99.9%||8.76 Hours||43.8 Minutes|
|4||99.99%||52.56 Minutes||4.38 Minutes|
|5||99.999%||5.26 Minutes||25.9 Seconds|
|6||99.9999%||31.5 Seconds||2.59 Seconds|
How Does High Availability Work?
In order to ensure that the IT systems are highly available, it’s important to design and build the necessary levels of resiliency and redundancy into their architectures, from end to end.
Resiliency refers to a system’s ability to withstand or spring back from operational disruption, and it is achieved by building redundancy into a solution.
Redundancy describes the inclusion of extra components (i.e. hardware and software) within an infrastructure, and replication of data across locations. These practices ensure that a system continues to function, even in the event of a component failure. They also ensure that data is able to be accessed at any time and from any location.
For instance, say that a manufacturing firm suffers a disk drive failure in one node of its highly available cluster. Because it has additional nodes in the cluster that contain exact copies of the data held on the failed node, the business is not impacted. Applications can be migrated or restarted on the remaining operational nodes without disruption, and the firm’s production lines continue to operate as if nothing had happened. The firm built a resilient system for high availability by providing redundant server nodes that maintain business continuity in the wake of a disruption.
What Are the Benefits of High Availability?
Nowadays, organizations are highly reliant on technology for day-to-day operations. In the event that a server has to be brought offline for maintenance, updates, or repairs, business functions are often hindered. Adding high availability to your infrastructure is like adding in an insurance policy, protecting your organization from disruptive downtime. When one node fails or is taken offline, the others remain operational, enabling your employees to keep working. This not only helps you avoid loss of productivity, but also loss of revenue. Using advanced hypervisor features, such as VMware’s Fault Tolerance, can also ensure applications suffer zero downtime if running on a node that suffers failure. Our Fault Tolerance white paper has more detail on this subject.
Business Continuity / Disaster Recovery
Implementing high availability within your IT environment can help your business remain resilient in the face of physical disruptions and natural disasters. By eliminating single points of failure, and adding redundancy into your infrastructure, your system is able to stay up and running, even if one component, such as a server node, is taken offline. Stretch clusters are another option that help organizations maintain high availability during disruptive circumstances. They enable organizations to install nodes across two or more different physical locations, so that if one is disrupted, the others remain operational. Your nodes can be separated across your office, campus, or even the entire city. Discover more about the benefits of stretch clusters in this white paper.
The nature of highly available infrastructure can allow for applications to be distributed across the nodes in a cluster. This improves compute performance, as your organization can utilize the extra resources available from multiple nodes while still ensuring the storage is highly available. A further configuration option is to build a highly available storage-only cluster which then presents the mirrored storage to compute-only nodes that handle the application workloads. Read more about storage-only clusters here.
High Availability and Edge Computing
Organizations operating at the network edge often have a high volume of stores or office locations, are based in remote places, and / or are operating within environments that have problematic network connectivity. They do not typically have the IT personnel onsite to fix problems as they arise and have to wait hours or even days for repairs to be made, resulting in loss of productivity and revenue. Within these environments, high availability is important to keep IT systems up and running, and help businesses remain operational.
When you’re faced with these limitations, you want to make sure your IT infrastructure is as resilient and reliable as possible. For example, in the event that a server is damaged or requires maintenance, you need to have an insurance policy (extra node) in place, to ensure that your entire system doesn’t go down; it’ll stay up and running.
Here are two examples of edge environments that benefited from implementing a high availability configuration.
Wind Turbine Farm
One of the largest energy firms in the world has hundreds of wind farm facilities that require constant management. When there’s no wind and each turbine’s blades stop turning, the weight of the blades can cause expensive damage to the turbine shafts. The software required to ensure the blades keep turning, even when there’s no wind, must therefore remain online at all costs. Given their remote locations, it can take up to six days to have an engineer carry out repairs. To avoid long periods of downtime and prevent damage to their turbines, they required a solution that enabled high availability. Learn more in our customer case study.
US Nationwide Retail Chain
A well-known retail store, located in the US, was losing revenue as a result of system downtime. They were averaging 100 outages per year and 6 hours of downtime per outage, which was severely affecting their business. Every time a store’s system went down, they were losing hours of productivity, customer loyalty, and revenue. They needed a high availability solution to eliminate downtime, as well as the need for onsite support, and help them maintain business continuity across more than 2000 stores. Read our customer case study for additional information.
High availability solutions are ideal for edge computing environments, as they help them combat downtime and remain up and running. This is especially beneficial for businesses that don’t have the in-house IT staff needed to attend to their systems.
Interested in learning more about what defines an ‘edge’ environment? Explore our Beginner's Guide on Edge Computing.
High Availability with SvSAN and SvKMS
StorMagic SvSAN and High Availability
StorMagic SvSAN is a virtual SAN solution that creates highly available storage across two nodes or more. Through active-active synchronous mirroring between two servers, SvSAN ensures there is always an exact copy of data on each server. In the event that one server is taken offline for maintenance or suffers a failure, the remaining server continues to operate. SvSAN enables high availability by eliminating single points of failure and ensuring there is no downtime or disruption to service in the organization.
SvSAN’s ability to provide highly available shared storage on a minimum of just two nodes is unique, and is made possible by the use of its lightweight witness. The witness can be sited locally or remote to the cluster, can provide quorum for hundreds of clusters at a time, and will run on as little as a Raspberry Pi.
StorMagic SvKMS and High Availability
StorMagic SvKMS is an encryption key management solution with flexible options for high availability (HA). Customer applications require uninterrupted access to their encryption keys, and SvKMS maintains this access through a powerful, highly available architecture.
SvKMS supports both a unique active-passive two-node HA setup and an active-active 2- node+1 clustering configuration that can create tiered levels of redundancy that scales for added assurance against any loss of encryption key access.
SvKMS HA uses shards to partition and replicate data, nodes to distribute those shards across multiple locations, and clustering to contain data and provide failover in the event that a node is disconnected from the network. It significantly reduces the possibility of a disruption in service that can result in a customer being unable to access their encryption keys.