Veritas Cluster I/O Fencing, Part 1

vcs

So you’ve got your Veritas Cluster up and running. There’s one advanced feature that really puts the icing on the cake in my mind: I/O Fencing.

Here’s a scenario: There is a partial power outage, IOS upgrade failure, spanning tree loop, or anything else that can cause multiple network switch failures in your data center. Because of this, your cluster nodes can no longer communicate using ANY of their heartbeat links, or public network. Without I/O fencing enabled, each node would believe all the other nodes were “down”, and try to perform a failover and run all the defined service groups in the cluster. Multiple nodes trying to read/write to the same storage may result in data corruption, and this is your worst case scenario. I/O fencing will help here.

There is a SCSI-3 feature called “SCSI-3 Persistent Reservation”, which allows cluster nodes to write “keys” to shared disks, effectively locking the disk for exclusive use by a node.

Ask your storage administrator to enable SCSI-3 Persistent Reservation on each LUN you are assigned. On some arrays, this is the default behavior, but others require you to turn on the feature per LUN.

All the cluster nodes must be assigned three small “coordinator disks”, which serve as a locking mechanism for the shared storage. Just three shared disks per cluster, and all nodes must have access to them.

In the same scenario above, when the heartbeats go down and nodes are thought to be “offline”, each surviving node will race for control of the coordinator disks, ejecting any other nodes’ keys and writing their own key to the disk, locking other nodes out.

If there is more than one surviving node in the cluster, the “loser” of the race will actually panic and reboot. That’s not a typo – the node will kernel panic and reboot. This is the only sure way to ensure the node will not proceed and potentially corrupt data on the shared storage.

Consult your VCS documentation for the setup steps to enable I/O Fencing. I will be posting a “part 2″ also, with my abbreviated version to get it up and running.

Share

3 Responses

  1. 在线代理 says:

    I really like following your blog as the articles are so simple to read and follow. Excellent. Please keep up the good work. Thanks.

  2. this is a deeply interesting send, as a consequence of you on the information. Contrite my english is not the sheer best. do you know if it is tenable to turn this to the spanish language. that would be sheer helpfull.

  3. Wonderful web site. Lots of useful info here. I am sending it to some friends ans additionally sharing in delicious. And obviously, thanks to your effort!