This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: "High-availability cluster" – news · newspapers · books · scholar · JSTOR (December 2011) (Learn how and when to remove this template message) This article or section possibly contains synthesis of material which does not verifiably mention or relate to the main topic. Relevant discussion may be found on the talk page. (December 2011) (Learn how and when to remove this template message) This article may lend undue weight to certain ideas, incidents, or controversies. Please help improve it by rewriting it in a balanced fashion that contextualizes different points of view. (December 2011) (Learn how and when to remove this template message) (Learn how and when to remove this template message)

High-availability clusters (also known as HA clusters, fail-over clusters) are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.[1]

HA clusters are often used for critical databases, file sharing on a network, business applications, and customer services such as electronic commerce websites. HA cluster implementations attempt to build redundancy into a cluster to eliminate single points of failure, including multiple network connections and data storage which is redundantly connected via storage area networks.

HA clusters usually use a heartbeat private network connection which is used to monitor the health and status of each node in the cluster. One subtle but serious condition all clustering software must be able to handle is split-brain, which occurs when all of the private links go down simultaneously, but the cluster nodes are still running. If that happens, each node in the cluster may mistakenly decide that every other node has gone down and attempt to start services that other nodes are still running. Having duplicate instances of services may cause data corruption on the shared storage.

HA clusters often also use quorum witness storage (local or cloud) to avoid this scenario. A witness device cannot be shared between two halves of a split cluster, so in the event that all cluster members cannot communicate with each other (e.g., failed heartbeat), if a member cannot access the witness, it cannot become active.

Application design requirements

Not every application can run in a high-availability cluster environment, and the necessary design decisions need to be made early in the software design phase. In order to run in a high-availability cluster environment, an application must satisfy at least the following technical requirements, the last two of which are critical to its reliable function in a cluster, and are the most difficult to satisfy fully:

Node configurations

2 node High Availability Cluster network diagram

The most common size for an HA cluster is a two-node cluster, since that is the minimum required to provide redundancy, but many clusters consist of many more, sometimes dozens of nodes.

The attached diagram is a good overview of a classic HA cluster, with the caveat that it does not make any mention of quorum/witness functionality (see above).

Such configurations can sometimes be categorized into one of the following models:

The terms logical host or cluster logical host is used to describe the network address that is used to access services provided by the cluster. This logical host identity is not tied to a single cluster node. It is actually a network address/hostname that is linked with the service(s) provided by the cluster. If a cluster node with a running database goes down, the database will be restarted on another cluster node.

Node reliability

HA clusters usually use all available techniques to make the individual systems and shared infrastructure as reliable as possible. These include:

These features help minimize the chances that the clustering failover between systems will be required. In such a failover, the service provided is unavailable for at least a little while, so measures to avoid failover are preferred.

Failover strategies

Systems that handle failures in distributed computing have different strategies to cure a failure. For instance, the Apache Cassandra API Hector defines three ways to configure a failover:

See also


  1. ^ van Vugt, Sander (2014), Pro Linux High Availability Clustering, p.3, Apress, ISBN 978-1484200803
  2. ^ Bornschlegl, Susanne (2012). Railway Computer 3.0: An Innovative Board Design Could Revolutionize The Market (pdf). MEN Mikro Elektronik. Retrieved 2015-09-21.

Further reading