How Broadcast
Signaling Works
Placing
this issue in the right perspective requires first understanding something
about how broadcast signaling works. When a member is invoked, it is supposed
to issue a broadcast message to the network. The member does so using its
cluster name as an identifier. If there happens to be a cluster present on the
network with the same cluster name, that cluster is expected to reply to that
broadcast message with its node name. In that case, the joining member should
send a join request to the cluster. The default port being used by the cluster
for issuing broadcast messages is 5405.
You
will see these messages when opening the NIC in promiscuous mode with either
wireshark or tcpdump:
22:25:36.455369
IP server-01.5149 > server-02.netsupport: UDP, length
106
22:25:36.455531
IP server-02.5149 > server-01.netsupport: UDP, length
106
22:25:36.665363
IP server-01.5149 > 255.255.255.255.netsupport: UDP, length 118
22:25:36.852367
IP server-01.5149 > server-02.netsupport: UDP, length
106
22:25:36.852526
IP server-02.5149 > server-01.netsupport: UDP, length
106
How Name resolution works and cluster.conf
Cman (the Cluster Manager) tries hard
to match the local host name(s) to those mentioned in cluster.conf. Here's how
it does it:
1. It looks up $HOSTNAME in
cluster.conf
2. If this fails it strips the
domain name from $HOSTNAME and looks up that in cluster.conf
3. If this fails it looks in
cluster.conf for a fully-qualified name whose short version matches the
short version of $HOSTNAME
4. If all this fails then it will
search the interfaces list for an (ipv4 only) address that matches a
name in cluster.conf
cman
will then bind to the address that it has matched.
Note: we will have to
make sure the settings in /etc/nsswitch.conf are set to:
hosts: files nis dns
nsswitch.conf
is a facility in Linux operating systems that provides a variety of sources for
common configuration databases and name resolution mechanisms.
Split Brain Issues
One of the most dangerous situations that can happen in clusters is that both nodes become active at the same time. This is especially true for clusters that share storage resources. In this case both cluster nodes could be writing to the data on shared storage which will quickly cause data corruption.When both nodes becoming active it is called “split brain” and can happen when a cluster stops receiving heartbeats from its partner node. Since the two nodes are no longer communicating they do not know if the problem is with the other node or if the problem is with itself.
For example say that the passive node stops receiving heartbeats from the active node due to a network failure of the heartbeat network. In this case if the passive node starts the cluster services then you would have a split-brain situation.
Many clusters use a Quorum Disk to prevent this from happening. The Quorum Disk is a small shared disk that both nodes can access at the same time. Whichever node is currently the active node writes to the disk periodically (usually every couple of seconds) and the passive node checks the disk to make sure the active node is keeping it up to date.
When a node stops receiving heartbeats from its partner node it looks at the Quorum Disk to see if it has been updated. If the other node is still updating the Quorum Disk then the passive node knows that the active node is still alive and does not start the cluster services.
Redhat clusters support Quorum Disks, but Redhat support had recommended not to use one since they are difficult to configure and can become problematic. Instead they recommend to relying on Fencing to prevent split brain.