What is High Availability? A Beginner's Guide

Published On: 16th October 202410.3 min readTags: high availability, redundancy, resiliency, storage, stretch cluster

What is High Availability?

High availability (HA) refers to a system’s ability to ensure uninterrupted operation and accessibility, typically measured as a percentage of uptime. Reduced downtime, the elimination of single points of failure, and the replication and distribution of data across multiple locations all contribute to creating a highly available architecture.

How is High Availability Measured?

High availability is measured as a percentage of uptime, and this is typically included in a service level agreement (SLA). If a solution is resistant to failure or downtime, it will achieve a 100% availability score. Availability is often expressed by the number of nines (9s) a system or application has. The table below provides an example of availability levels and associated downtime.

AVAL (# of 9s)	AVAL (%)	DT / Year	DT / Month
1	90%	36.5 Days	72 Hours
2	99%	3.65 Days	7.2 Hours
3	99.9%	8.76 Hours	43.8 Minutes
4	99.99%	52.56 Minutes	4.38 Minutes
5	99.999%	5.26 Minutes	25.9 Seconds
6	99.9999%	31.5 Seconds	2.59 Seconds

AVAL – Availability
DT – Downtime

IT teams use these metrics to plan system availability, while service providers apply them to guarantee service levels in service level agreements (SLAs), which outline service expectations, including availability. It’s important to understand that interpretations of HA metrics can vary, as users may view a system as unusable despite partial functionality due to performance issues.

Availability (no. of 9s)	Availability (%)	Downtime Per Year	Downtime Per Month
1	90%	36.5 Days	72 Hours
2	99%	3.65 Days	7.2 Hours
3	99.9%	8.76 Hours	43.8 Minutes
4	99.99%	52.56 Minutes	4.38 Minutes
5	99.999%	5.26 Minutes	25.9 Seconds
6	99.9999%	31.5 Seconds	2.59 Seconds

How Does High Availability Work?

To ensure that IT systems are highly available, it’s important to design and build the necessary levels of resiliency and redundancy into their architectures, from end to end. Let’s explore what resiliency and redundancy are.

High Availability and Resiliency

Resiliency refers to a system’s ability to withstand or spring back from operational disruption, and it is achieved by building redundancy into a solution.

High Availability and Redundancy

Redundancy describes the inclusion of extra components (i.e. hardware and software) within an infrastructure and the replication of data across locations. These practices ensure that a system continues to function, even in the event of a component failure. They also ensure that data can be accessed at any time and from any location.

For instance, say that a manufacturing firm suffers a disk drive failure in one node of its highly available cluster. Because it has additional nodes in the cluster that contain exact copies of the data held on the failed node, the business is not impacted. Applications can be migrated or restarted on the remaining operational nodes without disruption, and the firm’s production lines continue to operate as if nothing had happened. The firm built a resilient system for high availability by providing redundant server nodes that maintain business continuity in the wake of disruption.

What are High Availability Clusters?

In a high availability system, servers are set up in clusters and organized in a tiered architecture to respond to requests from load balancers. When one server in a cluster fails, another server takes over the workload, minimizing any impact on performance or service delivery. This redundancy allows for failover to a secondary component that assumes the workload when the primary component fails.

As systems become more complex, ensuring HA becomes more challenging due to the increased number of potential failure points.

What Are the Benefits of High Availability?

Reduced Downtime

Nowadays, organizations are highly reliant on technology for day-to-day operations. If a server has to be brought offline for maintenance, updates, or repairs, business functions are often hindered. Adding high availability to your infrastructure is like adding an insurance policy, protecting your organization from disruptive downtime.

Example: Node Failure
When one node fails or is taken offline, the others remain operational, enabling your employees to keep working. This not only helps you avoid loss of productivity but also loss of revenue. Using advanced hypervisor features can also ensure applications suffer zero downtime if running on a node that suffers a failure.

Business Continuity and Disaster Recovery

High availability helps your business remain resilient in the face of physical disruptions and natural disasters. By eliminating single points of failure, and adding redundancy to your infrastructure, your system will stay up and running, even if one component, such as a server node, is taken offline. Stretch clusters also do this. They enable you to install nodes across two or more different physical locations, so the others remain operational if one is disrupted. Discover more about the benefits of stretch clusters in this white paper.

Example: Single Points of Failure
A single point of failure can put your IT infrastructure at risk when there are physical disruptions, such as a natural disaster. Stretch clusters remove this risk as your nodes can be separated across your office, campus, or even the entire city. Additionally, HA means that even if one server node, in the impacted area, is taken offline, your IT infrastructure remains functional.

Performance

High availability architecture allows for the distribution of applications across the nodes in a cluster. This improves compute performance, as your organization can utilize the extra resources available from multiple nodes while still ensuring storage HA. This is sometimes called load balancing. A further configuration option is to build an HA storage-only cluster which then presents the mirrored storage to compute-only nodes that handle the application workloads. Read more about storage-only clusters here.

Example: Extra Resources
If you’re strained for resources, HA architecture lets applications be distributed across nodes in a cluster, so things don’t slow down. It makes the most of the resources available to your system and improves performance.

White Paper: Building a Highly Available System

Guidance and best practices on how to ensure a high availability solution

High Availability Solutions and Edge Computing

Organizations operating an edge environment, whether SMB or enterprise, often have unique requirements. These include:

Multiple locations
Remote locations
Operating environments with poor network connectivity

Edge environments don’t always have IT personnel onsite to fix problems as they arise. And if issues do happen, it can take hours or even days for repairs to be made, resulting in loss of productivity and revenue.

Within these environments, high availability is important to keep IT systems up and running, and help businesses remain operational.

When you’re faced with these limitations, you want to make sure your IT infrastructure is as resilient and reliable as possible. For example, if a server is damaged or requires maintenance, you need to have an insurance policy (extra node) in place, to ensure that your entire system doesn’t go down; it’ll stay up and running.

Here are two examples of edge environments that benefited from implementing a HA architecture.

High Availability at Airports

How do airports ensure their IT systems are highly available? This infographic explores two examples.

Wind Turbine Farm

One of the largest energy firms in the world has hundreds of wind farm facilities that require constant management. When there’s no wind and each turbine’s blades stop turning, the weight of the blades can cause expensive damage to the turbine shafts. The software required to ensure the blades keep turning, even when there’s no wind, must therefore remain online at all costs. Given their remote locations, it can take up to six days to have an engineer carry out repairs. To avoid long periods of downtime and prevent damage to their turbines, the company required a solution that enabled high availability. Learn more in our customer case study.

US Nationwide Retail Chain

A well-known retail chain, located in the US, was losing revenue as a result of system downtime. They were averaging 100 outages per year and 6 hours of downtime per outage, which was severely affecting their business. Every time a store’s system went down, they were losing hours of productivity, customer loyalty, and revenue. They needed a high availability solution to eliminate downtime, as well as the need for onsite support, and help them maintain business continuity across more than 2,000 stores. Read our customer case study for additional information.

HA solutions are ideal for edge computing environments, as they help them combat downtime and remain up and running. This is especially beneficial for businesses that don’t have the in-house IT staff needed to attend to their systems.

Interested in learning more about what defines an ‘edge’ environment? Explore our beginner’s guide for edge computing.

The Cost-effectiveness of HA in Edge Environments

Implementing high availability can significantly boost cost-effectiveness in edge environments. This is accomplished by distributing resources across multiple nodes, and HA minimizes the risk of expensive downtime. This approach not only ensures continuous operations but also leads to long-term savings compared to the costs associated with poor resource management and system outages.

For businesses looking for opportunities to reduce their IT spending, implementing an HA architecture can be a solution to drive down costs in the long-term. Over time, the benefits of reduced downtime and optimized resources pay back themselves. And you benefit from the addition of increased reliability, too.

Why is High Availability Important?

Removes single points of failure for components that could disrupt operations if they fail.
Ensures critical data is regularly backed up and can be quickly restored when needed.
Evenly distribute traffic across servers and hardware using load balancing.
With the right HA solution, you can continuously monitor the health and performance of database servers in the background.
In some cases, it can distribute resources across multiple geographic locations to protect against regional power outages or natural disasters.
Implements robust failover solutions for storage systems.

Additional High Availability resources you may find helpful:

HPE Solutions with StorMagic: Data protection and HA at the edge and beyond

Acronis Cyber Protect Cloud Integration with StorMagic SvSAN

VMware Fault Tolerance vs. High Availability

High Availability FAQ

How is High Availability used in Virtual Storage?

A virtual SAN solution can create high availability storage across two or more nodes. Through active-active synchronous mirroring between two servers, this setup ensures an exact copy of data is always present on each server. If one server goes offline for maintenance or failure, the remaining server continues to function, eliminating single points of failure and preventing downtime or service disruptions.

A virtual SAN can provide highly available shared storage with just two servers using a lightweight witness. The witness, which can be located locally or remotely, provides quorum for hundreds of clusters and can run on minimal hardware, such as a Raspberry Pi.

How is High Availability Used in Hyperconverged Infrastructure?

A hyperconverged infrastructure solution is designed to scale easily. Users can simply add on a new node and a lightweight remote witness to achieve high availability, and this only requires a minimum of two nodes. Shared storage works through active-active synchronous mirroring between the two servers. If one server fails or is taken offline, the remaining server continues to operate, removing disruption in service and eliminating downtime.

How is High Availability Used in Encryption Key Management?

An encryption key management solution offers flexible options for high availability, ensuring that customer applications maintain uninterrupted access to their encryption keys through a robust, highly available architecture.

This solution supports both an active-passive two-node HA setup and an active-active 2-node+1 clustering configuration, providing tiered redundancy that scales to prevent any loss of encryption key access.

By using shards to partition and replicate data, distributing those shards across multiple locations, and clustering to contain data, this setup ensures failover if a node disconnects from the network. It significantly reduces the risk of service disruptions that could prevent customers from accessing their encryption keys.

What is High Availability SLA?

High availability in a service level agreement (SLA) is a percentage of uptime agreed upon by a service vendor. This is what they’re expected to provide for their customers. Although HA metrics can sometimes be subjective, availability metrics should be defined within SLAs. Some IT teams might choose to measure other availability metrics, such as:

Mean time between failures (MTBF)
Mean downtime (MDT)
Recovery time objectives (RTO)
Recovery point objectives (RPO)

What is High Availability? A Beginner’s Guide

What is High Availability?

How is High Availability Measured?