Identifying network issues
Faulty, or misconfigured, network connections can wreak havoc on a cluster. The following is a list of possible issues and fixes.
Some network switches actively block multicast traffic. If a cluster transport is set to udp, this enables multicast communications.
Possible fixes include switching the traffic to udpu (UDP Unicast), or enabling multicast on the network switches.
Incorrectly configured firewalls can make a machine unreachable to the other nodes. Make sure that all high-availability services can be reached, as well as the network ports on the public network for any clustered services that are offered to consumers.
To view the firewall configuration on a cluster node use the below command:
# firewall-cmd --list-all interfaces: eth0 eth1 eth2 eth3 sources: services: dhcpv6-client http ssh ports: masquerade: no forward-ports: icmp-blocks: rich rules:
The high-availability service is notably absent in the firewall-cmd output above. To add this service back to the firewall configuration:
# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --reload
Verify that the cluster is now fully operational again.
# pcs status
If multiple cluster nodes are plugged into different switches, and the connection between those switches drops, the cluster will go into a split-brain mode, losing a number of nodes. The risk of these types of failures occurring can be reduced by using redundant networking and interconnects.
When a network link gets oversaturated, packets might be dropped, resulting in weird and intermittent cluster failures. These scenarios can be avoided by using separate networks for private cluster communications, public client access, and storage networks.