SUMMARY: Splitting ASE clusters

From: Jon Morgan <jmorgan_at_dircon.co.uk>
Date: Tue, 23 Jun 1998 12:13:03 +0100

Well, this isn't quite a summary, more observations about what
happened when I did this. I hope that this will come in useful
if anyone else needs to do it, or anything similar.

The scenario:
        Two ASE clusters needed to be moved, one node at
        a time. A node would stay at one site while the
        other was moved to the new site. Then this would
        be repeated with the other node and integrated
        back into two clusters.

The configuration:
        2 x AlphaServer 8400 5/300-2 - cluster 1
        2 x AlphaServer 8200 5/300-2 - cluster 2
        4 x HSZ40 controllers

The solution:

Well, it didn't go as badly as I thought it would. The manner in
which this was done ended up working quite smoothly. The first
thing that had to be done is to split the existing cluster into
two seperate nodes. One node was allocated for removal, while the
other remained at one site to continue the "live" operations. ASE
was convinced that the other node had died and hadn't come up (ie
it was running in "failover" mode).

Before trying to split the cluster, all services were failed onto
the node that was staying to be the live "cluster". This stops any
nasty failover situations on startup. We then performed full, and
I mean FULL, backups of all filesystems. Twice. / and /usr were done
from a CD-ROM in single user mode. This does not have to be done,
but sometimes you can't guarantee a / backup without doing this (and
this required building AdvFS domains from scratch, fun). ALL the
systems were then shutdown and powered off.

To split the cluster, the Y-cable was removed from the back of the
KFTIA I/O module on the AS8400 and AS8200 and terminated. If you're
paranoid, you should remove the Y-cable completely from the HSZ40,
and terminate there as well, although this is not a requirement (as
the Y-cable should keep the bus terminated).

(A rule of thumb when trying this sort of thing is that you should
have a bucket full (or thereabouts) of FWD SCSI terminators. I found
out the hard way and had to beg, borrow and steal some to get this
thing going.)

You are now ready to move the nodes. Thats the easy part.

One thing that is absolutely vital at this point: UNDER NO
CIRCUMSTANCES SHOULD THE ASE CONFIGURATION BE CHANGED. This applies
to any of the nodes. This is because any change to the configuration
will only be applied on that node and not on the other node. This
can cause inconsistancy within ASE which, at the very least, will
stop it from working or, at worst, could potentially corrupt your
system. I've seen the former, and it requires a complete rebuild of
ASE (including all start/stop scripts, services, devices, the lot) to
fix, and been warned of the latter by a DEC engineer.

Putting the systems back together was fairly straightforward (once you
work out which cable goes where :) It should just be a matter of
connecting the cluster together and then switching on. If the ASE
configuration hasn't been changed, then you should find that the cluster
will come up without too much trouble. (Ok, ok, I'm skipping that part
where the KZPAC - SWXCR - decided to kill itself and wipe our root
partition, but that wasn't TOO much of a problem, just a pain in the
a*se. Remember the backups? :)

I hope this helps. If there are any specific questions about how this
was done, or any issues that you may have with what happened, then
please don't hesitate to get in contact - I'll try and be as helpful
as possible (time allowing).

Cheers!

                -jon.

--
Jon Morgan						<jmorgan_at_dircon.co.uk>
DEC Systems Specialist
JRI Europe Ltd
					____________________
Received on Tue Jun 23 1998 - 13:19:16 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT