E.6. Common Behaviors: 3-5 Member Cluster

Network Partition

Common Causes: Network switch problem

Test Case: Connect a majority of members to switch A. Connect remaining members to switch B. Connect switch A to switch B using up-link or crossover cable. Start cluster services. Unplug switch A from switch B.

Expected Behavior: The partition with a majority of members continues operating, and a new view of the cluster quorum is formed. Members in the minority partition are fenced, and services which were running in the minority partition are started in the majority partition, if possible. In the test case, this means that members connected to switch A will fence members connected to switch B.

Verification: Run clustat on one of the members connected to switch A. There should be a Cluster Quorum Incarnation number listed near the top of the output.

System hang on cluster member

Test Case: Kill the clumembd daemon.

killall -STOP clumembd

Expected Behavior: The cluster member is fenced by another member. Services fail over. If watchdog timer is configured, it may be triggered.

Loss of access to shared media

Common Causes: Shared media loses power, cable connecting a member to the shared media is disconnected.

Test Case: Unplug SCSI or Fibre Channel cable from a member.

Expected Behavior: Configured action is taken to address loss of access to shared storage (reboot/halt/stop/ignore). Default is reboot.