What is a cluster?
A cluster is a set of computers working together on a single task. Which task is performed, and how that task is performed, differs from cluster to cluster. There are two different kinds of clusters covered in this post and upcoming posts to come in this series.
1. High-availability clusters:
The goal of a high-availability cluster, also known as an HA cluster or failover cluster, is to keep running services as available as they can be. This is primarily achieved by having the nodes of the high-availability cluster monitor each other for failures, and migrating services to a node that is still considered “healthy” when a service or node fails. High-availability clusters can be grouped into two subsets:
- Active-active high-availability clusters, where a service runs on multiple nodes, thus leading to shorter failover times.
- Active-passive high-availability clusters, where a service only runs on one node at a time.
High-availability clusters are often used to support mission-critical services in the enterprise. Examples of software that implement high-availability clustering are Pacemaker, and the Red Hat High Availability Add-On.
2. Storage clusters:
In a storage cluster, all members provide a single cluster file system that can be accessed by different server systems. The provided file system may be used to read and write data simultaneously. This is useful for providing high availability of application data, like web server content. without requiring multiple redundant copies of the same data. An example of a cluster file system is GFS2, which is provided by the Red Hat Resilient Storage Add-On.
What are the goals of a high-availability cluster?
The major goal of a high-availability cluster is to keep services as available as possible by eliminating bottlenecks and single points of failure. This is a different strategy than trying to keep the uptime for a single machine as high as possible. The uptime of the server running the service is not important for the consumers, but the service availability is. A high-availability cluster uses various concepts and techniques, Which allow for service integrity and availability:
Resources and resource groups
In clustering terminology, the basic unit of work is called a resource. A single IP address, filesystem or database would all be considered resources. Typically, relationships between these resources are defined to create user-facing services. One of the most common ways to define these relationships is to combine a set of resources into a group. This specifies that all resources in the group need to run together on the same node and establishes a fixed (linear) start and stop order. For example, in order for a cluster to provide a web server service, the web server daemon, the data the server is supposed to share, and the IP address the daemon will listen on all need to be available on the same cluster node.
High-availability clusters try to keep services available by migrating them to another node when the cluster notices that the node that was originally running the service is not responding; this is called failover.
Fencing is a mechanism that ensures a malfunctioning cluster node cannot cause corruption so that its resources can be safely recovered elsewhere in the cluster. This is necessary because we cannot assume that an unreachable node is actually off. Fencing is often accomplished by powering the node off since a dead node is clearly not able to do anything. In other cases, a combination of operations will be used to cut the node off from the network (to stop new work from arriving) or from storage (to stop the node from writing to shared storage).
Most high-availability clusters will also need a form of shared storage or storage that can be accessed from multiple nodes. Shared storage provides the same application data to multiple nodes in the cluster. The data may be accessed either sequentially or simultaneously by an application running on the cluster. A high-availability cluster needs to ensure data integrity on the shared storage. Data integrity is guaranteed by fencing.
Quorum describes a voting system that is required to maintain cluster integrity. Every cluster member has an assigned number of votes; by default, one vote. Depending on the cluster configuration, the cluster gains quorum when either half of the votes or more than half of the votes are present. Cluster members that fail to communicate with other cluster members and cannot send their votes are fenced by the majority of the cluster members that operate as expected. A cluster normally requires a quorum to operate. If a cluster loses or cannot establish a quorum, by default. no resources or resource groups are started up and running resource groups are stopped to ensure data integrity.
The following image shows a typical hardware configuration for a five-node HA cluster.
The different components in this infrastructure are as follows:
|Cluster Nodes||These are the machines that will be running the cluster software and the services.|
|Public Network||This network is used for communication between the clients and the services running on the cluster. Services normally have a floating IP address, which means that the IP address can be assigned to whichever node is currently running the corresponding service.|
|Private Network||This network is used exclusively for cluster communications, and communication for critical cluster hardware such as networked power switches.|
|Networked Power Switch||A networked power switch can be used to remotely control power to the cluster nodes. This is one of the possible ways to implement power fencing as described later in this course. Remote management cards such as ILO or DRAC can also be used for this purpose.|
|Fibre Channel Switch||All nodes in the example are connected to the same shared storage. Although Fibre Channel is widely used for this, an alternative would be a separate Ethernet network with iSCSI or FCoE.|
In order to provide cluster services with the Red Hat High Availability Add-on, multiple software components are required on the cluster nodes. An overview of the software and their functionality follows.
This is the framework used by Pacemaker for handling communication between the cluster nodes. Corosync is also Pacemaker’s source of membership and quorum data.
The pcs RPM package contains two cluster configuration tools:
- The pcs command provides a command-line interface to create, configure, and control every aspect of a Pacemaker/Corosync cluster.
- The pcsd service provides the cluster configuration synchronization and a web front end to create and configure a Pacemaker/Corosync cluster.
This is the component responsible for all cluster-related activities, such as monitoring cluster membership, managing the services and resources, and fencing cluster members. The pacemaker RPM package contains three important facilities:
- Cluster Information Base (CIB): The Cluster Information Base contains configuration and status information about the cluster and the cluster resources in XML format. A cluster node in the cluster gets elected by Pacemaker to act as a designated coordinator (DC), and stores cluster and resource status and cluster configuration that gets synchronized to all other active cluster nodes.
- Policy Engine (PEngine): The Policy Engine uses the contents of the cluster information base (CIB) to computer the ideal state of the cluster and how it should be reached.
- Cluster Resource Management Daemon (CRMd): The Cluster Resource Management Daemon coordinates and sends the resource start, stop, and status query actions to the Local Resource Management Daemon (LRMd) that runs on every cluster node. The LRMd passes the actions received from the CRMd to the resource agents.
- Shoot the Other Node in the Head (STONITH): STONITH is the facility responsible for processing fence requests and forwards the requested action to the fence device(s) configured in the CIB.
Before deploying a high-availability cluster with the Red Hat High Availability Add-on, it is important to understand the requirements and supportability of the cluster configuration. Certain cluster configurations require an architecture review to get support from Red Hat. An architecture review process typically requires the transmission of relevant data about the cluster, such as the cluster configuration, network architecture, and fencing configuration to Red Hat Support. The support representative might request additional data if required. Red Hat Support will then decide if the cluster configuration can be supported by Red Hat.
There are important requirements and recommendations that should be considered before deploying a high-availability cluster based on the Red Hat High Availability Add-on.
Number of nodes
Red Hat supports clusters with up to 16 nodes. Any cluster with eight or more nodes will need to be subjected to an architecture review to determine if the cluster setup is supported by Red Hat.
Clusters consisting of only two nodes are a special case. Although in most cases an architecture review is not mandatory, it is highly recommended to submit the architecture to Red Hat for a review before deploying a two-node cluster in production.
Single site, multisite, and stretch clusters
Red Hat fully supports single-site clusters. This is a cluster setup where all cluster members are in the same physical location, connected by a local area network.
Multisite clusters consist of two clusters, one active and one for disaster recovery. Failover for multisite clusters must be managed manually. Multisite clusters are supported with the Red Hat Enterprise Linux 7 High Availability Add-on.
Stretch clusters, also known as geo clusters, are clusters stretched out over multiple physical locations. Stretch clusters must go through an architecture review process with Red Hat Support. It is possible that Red Hat Support will elect to not support the particular configuration, or impose restrictions on the level and type of support provided for the cluster. Typically, Red Hat will require a network latency of less than 2 ms for a stretched cluster to be deemed supportable. Stretch clusters require a quorum disk to be used when using four or more cluster nodes if a version prior to Red Hat Enterprise Linux 7 High Availability Add-on 7 is used.
Fencing is a mechanism that ensures a malfunctioning cluster node cannot cause corruption so that its resources can be safely recovered elsewhere in the cluster. This can be done by power-cycling a node or disabling communication to the storage level. Fencing is required for all nodes in the cluster, either via power fencing, storage fencing, or a combination of both. Before deploying the high-availability infrastructure, ensure that supported hardware is used. If the cluster will use integrated fencing devices like ILO or DRAC, the systems acting as cluster nodes must power off immediately when a shutdown signal is received, instead of initiating a clean shutdown.
Virtualization and clustering
The Red Hat High Availability Add-on supports virtual machines both as cluster resources and as cluster nodes.
When operating as cluster resources, the virtualization host is participating in a cluster and the virtual machine is a resource that can move between cluster nodes.
When operating as cluster nodes, the virtual machines running on a host are members of the cluster and run resources. Special fencing agents are available so these cluster nodes can fence each other, whether running on an RHEL 7 libvirt-based system, Red Hat Enterprise Virtualization, or other VM hypervisor hosts. In this case, the physical host is a single point of failure for all the virtual machine based cluster nodes running on that host. If the physical host crashes, it crashes all the cluster node VMs running on it.
The corosync component requires unicast or multicast along with IGMP for the default network communication on the private network. In RHEL7, corosync/pacemaker clusters use unicast by default. For the public network, gratuitous ARP is used for the floating IP addresses. This must be supported by the network switch.
Any network ports used by the service(s) running on the cluster must be available on the public network. The following ports must be opened on the private network:
|21064/TCP||dlm (used with GFS2 resources)|
In addition, port 5404/UDP may be used by corosync as a source for some cluster communication messages.
The use of SELinux in enforcing mode is fully supported when using the targeted policy on the cluster nodes.
- Managing Cluster Membership in pacemaker cluster (Adding and removing a cluster node)
- Prohibiting a cluster node from hosting resources in pacemaker cluster (putting a node into standby mode)
- What is quorum in a pacemaker cluster (Understanding Quorum Operations)
- Managing Quorum Calculations in a Pacemaker Cluster
- What is fencing and What are different methods of fencing in a pacemaker cluster
- Setting Up Fencing Devices in a Pacemaker Cluster
- Configuring Cluster Fencing Agents in a Pacemaker Cluster
- How to Create and Configure Resources in a Pacemaker Cluster
- How to Create and Configure Resource Groups in a Pacemaker Cluster
- Managing resources in a pacemaker cluster (stop/start and relocate resource groups in a running cluster)
- How to Configure Cluster Logging for Corosync and Pacemaker
- How to Configure Pacemaker Cluster Notifications
- Troubleshooting Resource Failures in Pacemaker Cluster
- Troubleshooting Pacemaker Cluster Networking