Dear all,
Iīve a 2-node TruCluster V5.1 and run into the following mess:
I tried to patch the cluster with t64v51as0002-20001204 and did the
following steps
clu_upgrade check setup 1 ->O.K.
clu_upgrade setup 1 ->O.K.
clu_upgrade preinstall -> O.K.
then a lead member (node 1) single user mode:
bcheckrc
dupatch (in the untarred patch directory).
clu_upgrade postinstall -> o.k.
The next step was to roll member 2:
(single user mode, member 2):
bcheckrc
clu_upgrade roll
here I got a huge number of failures:
grep: can't open ./usr/.smdb./OSFPAT00009000505.inv
grep: can't open ./usr/.smdb./OSFPAT00001500505.inv
grep: can't open ./usr/.smdb./OSFPAT00001900505.inv
grep: can't open ./usr/.smdb./OSFPAT00003500505.inv
grep: can't open ./usr/.smdb./OSFPAT00004100505.inv
grep: can't open ./usr/.smdb./OSFPAT00004200505.inv
grep: can't open ./usr/.smdb./OSFPAT00005000505.inv
grep: can't open ./usr/.smdb./OSFPAT00006400505.inv
grep: can't open ./usr/.smdb./OSFPAT00006500505.inv
grep: can't open ./usr/.smdb./OSFPAT00006800505.inv
grep: can't open ./usr/.smdb./OSFPAT00007900505.inv
grep: can't open ./usr/.smdb./OSFPAT00009800505.inv
...
(from clu_upgrade.log)
The "505"-ending patches seems to be nowhere. As a side effect,
the roll wanted no kernel build and stoped very soon claiming that
the roll was finished. After the (needed) reboot, the member2 wasnīt
able to mount the cluster file systems beyond cluster root (stated
I/O error on all other file systems). I made a shutdown of both nodes
and started member 2 first into single user mode; bcheckrc and now it
got the file systems;( afterwards a boot of node 1 was also o.k.). Now it
seemed that this was now a general rule, member 2 must have booted
first to boot into the cluster.
So I decided to unroll everything:
(member 2, single user mode):
clu_upgrade undo roll -> O.K.
member 1)
clu_upgrade undo postinstall -> O.K.
(single user mode, after bcheckrc
dupatch and deletion of all patches.
clu_upgrade undo preinstall -> O.K.
clu_upgrade undo setup 1 -> message
o you want to continue to undo this stage of the upgrade? [yes]:
*** Error ***
All members are NOT at the same Base software version.
*** Error ***
All members are NOT at the same TruCluster software version.
and it doesnīt matter if the member 2 run on tagged or not on tagged files
(I tried
with clu_upgrade tagged disable/enable 2)
>From this point there is obviously no return possible.
So unfortunately I tried to patch with a freshly untarred patch kit again:
clu_upgrade preinstall -> O.K.
(single user mode member 1)
dupatch -> everything was installed and the kernel build.
B U T: after the reboot now member 1 got the I/O errors on the
cluster file systems (not the root file system). Here I tried with a few
kernels
(old one, genvmunix several versions...): into single user mode, any attempt
(bcheckrc) to mount the cluster file systems failed. Now (surely my fault) I
thought it would be better to work with the backup in
/var/.clu_upgrade/backup.member1.tar
and restored it from member2 (the member 1 boot_partition was also mounted -
this worked
any time). Now the situation is:
member2 is running on tagged files
member 1 (with any available kernel) canīt boot successful into single user
mode (nor in multiuser mode),
I see from member2 that the member1 boot_partition is mounted, but all I get
from the boot
sequence of member1 are kernel messages ending in:
clsm: initiated
vm_swap_init: swap is set to ...
CNX QDISK: Successfully claimed quorum disk, adding 1 vote
(and then hang of member 1...)
Up to this point any kernel messages seem to be quite normal for the
cluster.
the clu_get_info claims the member are UP
Any hints ...?
Regards,
Rudi Gabler
Received on Thu Mar 01 2001 - 07:29:59 NZDT