So you’ve got your Veritas Cluster up and running. There’s one advanced feature that really puts the icing on the cake in my mind: I/O Fencing.
Here’s a scenario: There is a partial power outage, IOS upgrade failure, spanning tree loop, or anything else that can cause multiple network switch failures in your data center. Because of this, your cluster nodes can no longer communicate using ANY of their heartbeat links, or public network. Without I/O fencing enabled, each node would believe all the other nodes were “down”, and try to perform a failover and run all the defined service groups in the cluster. Multiple nodes trying to read/write to the same storage may result in data corruption, and this is your worst case scenario. I/O fencing will help here.
There is a SCSI-3 feature called “SCSI-3 Persistent Reservation”, which allows cluster nodes to write “keys” to shared disks, effectively locking the disk for exclusive use by a node.
Ask your storage administrator to enable SCSI-3 Persistent Reservation on each LUN you are assigned. On some arrays, this is the default behavior, but others require you to turn on the feature per LUN.
All the cluster nodes must be assigned three small “coordinator disks”, which serve as a locking mechanism for the shared storage. Just three shared disks per cluster, and all nodes must have access to them.
In the same scenario above, when the heartbeats go down and nodes are thought to be “offline”, each surviving node will race for control of the coordinator disks, ejecting any other nodes’ keys and writing their own key to the disk, locking other nodes out.
If there is more than one surviving node in the cluster, the “loser” of the race will actually panic and reboot. That’s not a typo – the node will kernel panic and reboot. This is the only sure way to ensure the node will not proceed and potentially corrupt data on the shared storage.
Consult your VCS documentation for the setup steps to enable I/O Fencing. I will be posting a “part 2″ also, with my abbreviated version to get it up and running.