The Benefits of 2-Node vs 3-Node HCI Clusters at the Edge

Published On: 22nd July 2022//5.4 min read//Tags: , , , , //

Two’s company, but three’s a crowd at the edge

Edge sites and small datacenters are typically budget and resource-constrained – but they still need highly available applications to run their operations effectively. In these instances, system size does matter. If each site uses 2-node instead of 3-node architecture (i.e. two servers instead of three), massive cost and management savings can be achieved.

More and more IT deployments are happening at small sites (SMBs) and the edge because compute and storage often need to be as close to where data is being created as possible. This creates massive problems for users because budgets are typically very small and IT resources at these sites are non-existent. Of course, solution architects sometimes try to eliminate the need for onsite systems at these small sites, but the math doesn’t always add up. Running all the necessary applications in the cloud and storing all the local data in the cloud just doesn’t work because of latency (no matter how fast the connection, the speed of light cannot be ignored), uptime (what happens if there is an internet outage?) and cost (application, storage and transmission costs all add up when talking about multi-site deployments). Once the decision is made that onsite compute and storage hardware is required, the next step is to figure out how to design for maximum uptime at the lowest cost (both CapEx and OpEx).

Don’t default to a 3-node cluster config

The most common approach in the industry is to implement a hyperconverged infrastructure (HCI) solution using three servers with some type of virtual SAN software (to eliminate the need for even more complicated and costly infrastructure, such as a SAN). Most providers offer this 3-node configuration because their systems were designed for the datacenter and/or cloud where scaling beyond three nodes is needed. Typically, the three nodes are able to balance performance, deal with node failures and expand easily because of erasure coding. However, the erasure coding approach drives significantly more network traffic in order to handle the extra writes and parity check operations needed.

These kinds of systems have requirements that drive the cost per site quite high:

  • 3 physical servers
  • High-powered processors (e.g. Intel Xeon-Silver 4210 (2.2GHz/10-core/85W))
  • Large amounts of memory (typically 32GB RAM per server min)
  • Fast networking (10 Gbit minimum)
  • All servers, storage, adapters and firmware must be identical
  • More floor space
  • More things to manage and more things that can break to cause an outage

StorMagic’s 2-node architecture

StorMagic is, and always has been, focused on solving these edge computing and small datacenter problems. We’ve never been hamstrung by datacenter/cloud requirements for massively scalable systems. 2-node is our bread and butter. We are able to offer only two nodes per site, and ensure 100% uptime because we don’t have to worry about massively expandable clusters. Plus, we can support a 3-node cluster configuration, where additional resiliency is required, but the architecture is the same as 2-node.

High availability - 2-node architecture

SvSAN’s architecture is built on simple synchronous mirroring between the two servers. We are able to keep two server nodes up and running no matter what happens to the hardware. Synchronous mirroring keeps the complexity down (compared to the erasure coding approach) since we are singularly focused on making sure every datastore is always available on both servers. That’s it. (Of course, there is a lot of complicated technology behind the scenes, like cache coherency, write-back and read-through caching, but this is all invisible to customers.) The real magic comes from our remote witness.

Witness the power and benefits of 2-node

In order for clustered systems to maintain 100% uptime and ensure data integrity (and avoid the “split brain” problem) there needs to be a third node to act as “arbitrator” to make sure the two nodes are functioning, and each one is aware of the other’s health. SvSAN does not require this third node to be at each site – we have architected it to run somewhere else (in your own datacenter or a cloud). We call this the remote witness, and each one is a VM that can manage up to a thousand 2-node clusters.

High availability - 2-node cluster with witness

This witness isn’t in the data path, and it is only checking the “heartbeat” from all the servers. It can run on any hardware (even a Raspberry Pi) and only requires 9Kb/sec network bandwidth (less than 100 bytes of data is transmitted per second) and can handle latency of 3,000ms.

Some competitive solutions also support a remote witness, but there are typically limitations involved. Since their remote witness is in the data path due to erasure coding implementations, the network requirements, and hardware required at the remote sites is still very similar to the onsite witness. They are also typically limited to lower numbers of clusters per remote witness, while StorMagic supports up to 1,000.

The benefits of our SvSAN architecture:

  • Fewer servers to buy
  • Lower processor speed requirements (2GHz)
  • Less memory needed (1GB)
  • Less networking bandwidth at each site (9Kb/sec)
  • Can have clusters using different server models and brands
  • One witness can support up to 1,000 clusters
  • Servers can be up to 1,000Km apart in a stretched cluster

A real world example of 2-node vs 3-node

Putting this all together: a 2-node edge system saves customers at least 33% on hardware costs compared to a 3-node, and often significantly more. One recent example to highlight this cost saving is a StorMagic edge customer with 6,000 sites. They were considering a solution that used VMware’s vSAN. The StorMagic SvSAN solution only required a purchase of 6,000 new servers, while the competitive solution required 18,000. Because SvSAN can handle clusters of dissimilar hardware, we were able to use the one existing server from each location, while the VMware vSAN solution required three new servers for each site because each server must be exactly the same.

Bottom line – StorMagic was able to save this customer millions of dollars in hardware and administrative costs, and the solution has been up and running for two years with no unscheduled downtime.

To learn more about the benefits of 2-node over 3-node architectures, read through our Introduction to SvSAN guide, or take a deep dive into the SvSAN remote witness functionality with the feature white paper. Alternatively, why not book some time with our team for a discussion about SvSAN and a live demonstration?

Share This Post, Choose Your Platform!

Recent Blog Posts