If one or more nodes in a cluster fail, it is possible that not all cluster nodes will not be able to “see” one another. In fact, it is possible that two sets of nodes might become isolated from one another in a network partitioning, also known as a “split brain” scenario. This type of situation is undesirable because each set of nodes tries to behave as though it is the entire cluster.
When cluster nodes go down, there are two possibilities. If more than 50% of the remaining nodes can communicate with each other, then we have what is sometimes called a “majority rules” situation, and this set of nodes is considered to be the cluster. The arbitrator comes into play when there is an even number of nodes: in such cases, the set of nodes to which the arbitrator belongs is considered to be the cluster, and nodes not belonging to this set are shut down.
The above information is somewhat simplified, a more complete explanation taking into account node groups follows below:
When all nodes in at least one node group are alive, network partitioning is not an issue, because no one portion of the cluster can form a functional cluster. The real problem arises when no single node group has all its nodes alive, in which case network partitioning (the “split-brain” scenario) becomes possible. Then an arbitrator is required. All cluster nodes recognise the same node as the arbitrator, which is normally the management server; however, it is possible to configure any of the MySQL Servers in the cluster to act as the arbitrator instead. The arbitrator accepts the first set of cluster nodes to contact it, and tells the remaining set to shut down. Arbitrator selection is controlled by the ArbitrationRank configuration parameter for MySQL Server and management server nodes. (See external resources link in the margin for details.) It should also be noted that the role of arbitrator does not in and of itself impose any heavy demands upon the host so designated, and thus the arbitrator host does not need to be particularly fast or to have extra memory especially for this purpose.
What Role does the Arbitrator Play in a Failure Situation?
An arbitrator is required whenever there is the possibility for a network partitioning scenario. It is used to decide which of the surviving data node setups is allowed to continue running. Any data nodes that do not get approved by the arbitrator will shutdown in order to prevent a “split-brain” from occurring.
The arbitrator can be either an SQL node or the management server node, with the management node being a more common choice. There can be only a single arbitrator active at any one time. In the event that the arbitrator itself fails then the cluster will automatically elect a new arbitrator. However, this new election can not take place during another node failure, but it would occur after the node failure has been resolved.
You can influence the choice of the arbitrator by settings in the config.ini file. Both of the [mysqld] and [mgm] groups can use the ArbitrationRank setting, which will change the ranking of the arbitrator choice. The setting can be one of three possible values:
- 0: The node will never be used as an arbitrator.
- 1: The node has high priority; that is, it will be preferred as an arbitrator over low-priority nodes.
- 2: Indicates a low-priority node which be used as an arbitrator only if a node with a higher priority is not available for that purpose.
An example configuration file is below:
[MGM] ArbitrationRank = 1 [NDBD DEFAULT] NoOfReplicas = 2 Datadir = /var/lib/mysql-cluster [NDBD] Hostname = ndbd_host1 [NDBD] Hostname = ndbd_host2 [MYSQLD] Hostname = mysqld_host1 ArbitrationRank = 2 [MYSQLD] Hostname = mysqld_host2 ArbitrationRank = 1
In the above, case the mysqld server on mysqld_host2 or the MGMD node will be the highest priority and hence the normal arbitrators. If both of them fail, then the first mysqld on host mysqld_host1 will be chosen instead. In most normal cases you will want the arbitrator to be a separate system from the data nodes, if possible. This will help prevent the possibility of data nodes and the arbitrator from failing at the same time.