Tuesday, January 3, 2012

Switch a RedHat cluster from Broadcast mode to Multicast Mode


Switch a cluster from Broadcast mode to Multicast Mode
(this is for RHEL 5.2 to 5.6, not tested on 6.x)

Redhat Cluster bonded interfaces eth0/eth2

1. Edit /etc/cluster/cluster.conf and change version number
[root@ computer ~]#vi /etc/cluster/cluster.conf

Look at line 2 for the version number:

<?xml version="1.0"?>
<cluster config_version="11" name="ttrs">

Change it to version 12 so it looks like this:

<cluster config_version="11" name="ttrs">



 
2. Change cluster Join Delay

When the cluster is quorate and the fence domain is first created (by a fence daemon being started), any nodes not yet in the cluster will be fenced.  By default there's a delay of 6 seconds in this case to allow any nodes unnecessarily flagged for fencing to join the cluster an avoid being fenced.
This delay can be increased by setting post_join_delay in cluster.conf:
  <fence_daemon post_join_delay="30">

Change line 3 to look like this:

<fence_daemon post_fail_delay="0" post_join_delay="30"/>


 
3. Continue Editing /etc/cluster/cluster.conf and change communication mode

Line 26 should look like this:

<cman broadcast="yes" expected_votes="1" two_node="1"/>

Change it to look like this:

<multicast addr="239.199.10.1" interface="bond1"/> (Get an unused Multicast IP this is just an example )

You need to add this line for every clusternode in the cluster and in the cman section so it looks like this:

<?xml version="1.0"?>
<cluster alias="clustertest" config_version="55" name="rhcluster">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
     
     <clusternodes>
                <clusternode name="rhcluster01-priv" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="pdu_01a" option="off" port="13"/>
                                        <device name="pdu_02b" option="off" port="13"/>
                                        <device name="pdu_01a" option="on" port="13"/>
                                        <device name="pdu_02b" option="on" port="13"/>
                                </method>
                        </fence>
                        <multicast addr="239.199.10.1" interface="bond1"/>
                </clusternode>
                <clusternode name="rhcluster02-priv" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="pdu_13a" option="off" port="14"/>
                                        <device name="pdu_13b" option="off" port="14"/>
                                        <device name="pdu_13a" option="on" port="14"/>
                                        <device name="pdu_13b" option="on" port="14"/>
                                </method>
                        </fence>
                        <multicast addr="239.199.10.1" interface=" bond1"/>
                </clusternode>
        </clusternodes>

        <cman expected_votes="1" two_node="1">
                <multicast addr="239.199.10.1" interface=" bond1"/>
        </cman>



4. stop the cluster on both nodes
[root@server1.net ~]#service rgmanager stop
[root@server1.net ~]# service fenced stop
[root@server1.net ~]# service cman stop

[root@server2.net ~]#service rgmanager stop
[root@server2.net ~]# service fenced stop
[root@server2.net ~]# service cman stop


 

5. Start the cluster on both nodes, quickly (within 30 seconds)

[root@server1.net ~]#service rgmanager start
[root@server1.net ~]# service fenced start
[root@server1.net ~]# service cman start

[root@server2.net ~]#service rgmanager start
[root@server2.net ~]# service fenced start
[root@server2.net ~]# service cman start


6. look at log file at /var/log/messages

You should see something like this below, in relevance to the server and app you are working on.
Highlighted are important markers in these messages:


[root@server-01-01 cluster]# tail -100 /var/log/messages 
Feb 10 16:08:00 server-01-01 ccsd[21438]:  Built: Jul 28 2010 19:18:39
Feb 10 16:08:00 server-01-01 ccsd[21438]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Feb 10 16:08:00 server-01-01 ccsd[21438]: cluster.conf (cluster name = rhclustertest, version = 56) found.
Feb 10 16:08:01 server-01-01 openais[21448]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Feb 10 16:08:01 server-01-01 openais[21448]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Feb 10 16:08:01 server-01-01 openais[21448]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Feb 10 16:08:01 server-01-01 openais[21448]: [MAIN ] AIS Executive Service: started and ready to provide service.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1402
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] send threads (0 threads)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] RRP token expired timeout (495 ms)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] RRP token problem counter (2000 ms)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] RRP threshold (10 problem count)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] RRP mode set to none.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] heartbeat_failures_allowed (0)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] max_network_delay (50 ms)
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] The network interface [10.0.80.1] is now up.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] Created or loaded sequence id 196.10.0.80.1 for this ring.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] entering GATHER state from 15.
Feb 10 16:08:01 server-01-01 openais[21448]: [CMAN ] CMAN 2.0.115 (built Jul 28 2010 19:18:41) started
Feb 10 16:08:01 server-01-01 openais[21448]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais event service B.01.01'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais message service B.01.01'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais configuration service'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Feb 10 16:08:01 server-01-01 openais[21448]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Feb 10 16:08:01 server-01-01 openais[21448]: [SYNC ] Not using a virtual synchrony filter.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] Creating commit token because I am the rep.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] Saving state aru 0 high seq received 0
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] Storing new sequence id for ring c8
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] entering COMMIT state.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] entering RECOVERY state.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] position [0] member 10.0.80.1:
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] previous ring seq 196 rep 10.0.80.1
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] aru 0 high delivered 0 received flag 1
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] Did not need to originate any messages in recovery.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] Sending initial ORF token
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ] CLM CONFIGURATION CHANGE
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ] New Configuration:
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ] Members Left:
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ] Members Joined:
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ] CLM CONFIGURATION CHANGE
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ] New Configuration:
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ]    r(0) ip(10.0.80.1) 
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ] Members Left:
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ] Members Joined:
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ]    r(0) ip(10.0.80.1) 
Feb 10 16:08:01 server-01-01 openais[21448]: [SYNC ] This node is within the primary component and will provide service.
Feb 10 16:08:01 server-01-01 openais[21448]: [TOTEM] entering OPERATIONAL state.
Feb 10 16:08:01 server-01-01 openais[21448]: [CMAN ] quorum regained, resuming activity
Feb 10 16:08:01 server-01-01 openais[21448]: [CLM  ] got nodejoin message 10.0.80.1
Feb 10 16:08:02 server-01-01 ccsd[21438]: Initial status:: Quorate
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] entering GATHER state from 11.
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] Creating commit token because I am the rep.
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] Saving state aru c high seq received c
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] Storing new sequence id for ring cc
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] entering COMMIT state.
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] entering RECOVERY state.
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] position [0] member 10.0.80.1:
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] previous ring seq 200 rep 10.0.80.1
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] aru c high delivered c received flag 1
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] position [1] member 10.0.80.2:
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] previous ring seq 196 rep 10.0.80.2
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] aru a high delivered a received flag 1
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] Did not need to originate any messages in recovery.
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] Sending initial ORF token
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] CLM CONFIGURATION CHANGE
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] New Configuration:
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ]    r(0) ip(10.0.80.1) 
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] Members Left:
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] Members Joined:
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] CLM CONFIGURATION CHANGE
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] New Configuration:
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ]    r(0) ip(10.0.80.1) 
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ]    r(0) ip(10.0.80.2) 
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] Members Left:
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] Members Joined:
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ]    r(0) ip(10.0.80.2) 
Feb 10 16:08:02 server-01-01 openais[21448]: [SYNC ] This node is within the primary component and will provide service.
Feb 10 16:08:02 server-01-01 openais[21448]: [TOTEM] entering OPERATIONAL state.
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] got nodejoin message 10.0.80.1
Feb 10 16:08:02 server-01-01 openais[21448]: [CLM  ] got nodejoin message 10.0.80.2
Feb 10 16:08:10 server-01-01 kernel: dlm: Using TCP for communications
Feb 10 16:08:11 server-01-01 kernel: dlm: connecting to 2
Feb 10 16:08:11 server-01-01 kernel: dlm: got connection from 2
Feb 10 16:08:11 server-01-01 clurgmgrd[21512]: <notice> Resource Group Manager Starting
Feb 10 16:08:17 server-01-01 clurgmgrd[21512]: <notice> Starting stopped service service:http_service
Feb 10 16:08:18 server-01-01 clurgmgrd[21512]: <notice> Service service:http_service started

 
7. Verify the cluster and app are running:


[root@server-01 ~]# clustat
Cluster Status for trrs @ Fri Feb 10 21:12:01 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 server01-priv                                            1 Online, Local, rgmanager
 server02-priv                                            2 Online, rgmanager

 Service Name                                                     Owner (Last)                                                     State        
 ------- ----                                                     ----- ------                                                     -----        
 service:TRRS_VIP                                                 server02-priv                                         started      


[root@server-02 ~]# clustat
Cluster Status for trrs @ Fri Feb 10 21:32:01 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 Server01-priv                                            1 Online, rgmanager
 Server02-priv                                            2 Online, Local, rgmanager

 Service Name                                                     Owner (Last)                                                     State        
 ------- ----                                                     ----- ------                                                     -----        
 service:TRRS_VIP                                                 server-02-priv                                         started      




8. Done!

No comments:

Post a Comment