• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

The Geek Diary

CONCEPTS | BASICS | HOWTO

  • OS
    • Linux
    • CentOS/RHEL
    • Solaris
    • Oracle Linux
    • Linux Services
    • VCS
  • Database
    • oracle
    • oracle 12c
    • ASM
    • mysql
    • MariaDB
    • Data Guard
  • DevOps
    • Docker
    • Shell Scripting
  • Interview Questions
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

Most Common Two-node Pacemaker cluster issues and their workarounds

By admin

While two-node clusters may look good on paper, in practice there are a lot of extra failure scenarios that are not present with three or more node clusters.

If the choice for a three-node or larger cluster cannot be made, it is recommended to have Red Hat perform an architecture review of the intended two-node cluster.

Possible issues

Two-node clusters have their own set of issues. The following list shows some of the most common ones, and workarounds to avoid these issues.

1. No room for node failure

In a default cluster setup, at least 50% + 1 of the nodes must be up for the cluster to be quorate. In a two-node cluster this would come down to two votes, meaning the entire cluster. By enabling the special two_node mode of votequorum, the number of expected votes will be brought down to one, as long as there are only two-nodes in the cluster. This allows one node to fail, with the other node remaining a quorate cluster all by itself.

2. Split-brain

A split-brain situation occurs when two halves of a cluster both think they are quorate, and start competing for shared resources.

If fencing is configured and tested, a split-brain scenario cannot occur, since both nodes will attempt to fence each other before recovering any resources.

3. Fence death/fence racing

Since both nodes in a two-node cluster can maintain quorum by themselves, administrators can run into a phenomenon known as a fence race. A fence race is what happens when communication between the two-nodes is interrupted, but they can still fence each other, either because fabric-level fencing is used, or because the fencing devices are on a different network from the cluster communication.

If the fence device has serialized access, meaning that only one machine can talk to it a time, this causes no problems. One node will win the fence race, and the other node will be fenced (although if the communication problem is persistent, the node that was just fenced will fence the winning node when it comes back up, leading to a reboot-fence cycle).

In situations where there are multiple fencing devices (e.g., ILO cards), an undesirable outcome can occur: both nodes fencing each other at the same time. One way of combatting this is to set a delay on one of the fencing devices.

4. The cluster does not start until both nodes have started.

When the special two_node mode of
corosync_votequorum is enabled, the wait_for_all mode will also be implicitly enabled, unless expressly disabled in the configuration file. The wait_for_all makes the cluster wait for all nodes to be up at the same time before starting any cluster resources.

If the cluster should be able to start with only one node, the wait_for_all option can be expressly disabled either at cluster creation time, or later in corosync.conf.

Beginner Guide to RHEL 7 high-availability cluster – Architectural Overview

Filed Under: pacemaker

Some more articles you might also be interested in …

  1. Troubleshooting Resource Failures in Pacemaker Cluster
  2. Prohibiting a cluster node from hosting resources in pacemaker cluster (putting a node into standby mode)
  3. Managing Cluster Membership in pacemaker cluster (Adding and removing a cluster node)
  4. What is quorum in a pacemaker cluster (Understanding Quorum Operations)
  5. Managing Quorum Calculations in a Pacemaker Cluster
  6. How to Configure Pacemaker Cluster Notifications
  7. Beginner Guide to RHEL 7 high-availability cluster – Architectural Overview
  8. Troubleshooting Pacemaker Cluster Networking
  9. How to Create and Configure Resources in a Pacemaker Cluster
  10. Managing Clustered Logical Volumes in RHEL Cluster (pacemaker)

You May Also Like

Primary Sidebar

Recent Posts

  • How to Disable IPv6 on Ubuntu 18.04 Bionic Beaver Linux
  • How to Capture More Logs in /var/log/dmesg for CentOS/RHEL
  • Unable to Start RDMA Services on CentOS/RHEL 7
  • How to rename a KVM VM with virsh
  • Archives
  • Contact Us
  • Copyright

© 2021 · The Geek Diary