In this post, we will discuss the minimum requirements to have no single point of failure for all services in a MySQL Cluster setup. To remove all single points of failure there are the following requirements to the Cluster configuration:
- There must be two copies of the data
- To allow API and data nodes to come online, there must be at least two management nodes
- There must be at least two API nodes
- If one of the data nodes goes offline, an arbitrator must be present
Each of these requirements are discussed in the following sections.
1. Two Copies of the Data
This requires you to have at least two data nodes (ndbd or ndbmtd) and they have to run on separate servers. You also need to set the number of replicas to 2 in the Cluster configuration file, for example:
[ndbd default] NoOfReplicas = 2
2. Management Nodes
In order for a data node or an API node to come online or be restarted, it is necessary to get the configuration from a management node. This means you should have at least two management nodes.
3. API Nodes
API nodes, for example, mysqld, are used to allow the clients to read and update the NDB tables. To ensure continued service if one API node is unavailable, you must have at least two API nodes. As API nodes can be run from the same servers, this requirement does not impose restrictions on the number of servers required.
4. Arbitrator
MySQL Cluster uses an arbitrator to avoid a split brain scenario for example in case of a network partitioning. By default, the management node is the arbitrator, but mysqld nodes can also be an arbitrator. While two arbitrators will be required to avoid having the arbitrator as a single point of failure, the arbitrator will not be needed until at least one of the data nodes has failed or there is a network partitioning. So taking that into account, it can be argued that having just one arbitrator, is not a single point of failure.
Conclusion
To fulfill the above requirements, you will need at least three servers with the following services running:
- Machine 1: ndbd (node group=1), ndb_mgmd (ArbitrationRank=0)
- Machine 2: ndbd (node group=1)
- Machine 3: ndb_mgmd (ArbitrationRank=1)
The proposed setup will include redundancy for the following conditions:
- The single node group split over two ndbd processes on separate machine provide protection against a single point of failure for the data nodes.
- The two instances on different machines of ndb_mgmd provide protection against a single point of failure for the management nodes.
The only thing where there is a “single point of failure” is for the arbitration. However as discussed, it requires both one of the data nodes and the management node on Machine 3 to become unavailable before being an issue, it can also be argued that this is not a single point of failure either.
Note
1. The above does not take the application/SQL nodes into consideration. These can run on either of the three machines.
2. While 3 servers can give protection against a single point of failure, it is recommended to use at least 4 servers which allow you to have each of the management nodes on separate servers.
3. It is important to set ArbitrationRank = 0 for the ndb_mgmd node that is on the same host as a data node. The reason is that this must never become an arbitrator (at least while the other ndb_mgmd node is online). In the above example, if Machine 1 crashes (for example hardware failure or power outage) and the ndb_mgmd on Machine 1 is the arbitrator, then the whole cluster will be offline as the data node on Machine 2 will not be able to win the arbitration.