Thank you to
Dan Goetzman and Harald Baumgartner
Dan pointed out that the errors during the clu_upgrade roll is documented
in the release notes (yes it`s always good to read them before).
I found that the main problem (.. member1 is not booting int single user
mode)
was a fake: after the kernel startet it connects to the console by the
filesystem; somehow console was set not to 0 0 (but to 62 3 I don't know
why).
The failure that member1 couldn't boot the rest of the cluster file system
remained,
so I started member 1 first, did the patch again, rolled member2 and
managed to get to clean status. Now I could remove and add member1 from the
cluster
and everything works like a charm now.
Regards,
R.G.
-----Ursprüngliche Nachricht-----
Von: tru64-unix-managers-owner_at_ornl.gov
[mailto:tru64-unix-managers-owner_at_ornl.gov]Im Auftrag von Rudolf Gabler
Gesendet: Donnerstag, 1. März 2001 08:29
An: Tru64-Unix-Managers (E-Mail)
Betreff: Tru64 V5.1 patch 1 problem
Dear all,
I´ve a 2-node TruCluster V5.1 and run into the following mess:
I tried to patch the cluster with t64v51as0002-20001204 and did the
following steps
clu_upgrade check setup 1 ->O.K.
clu_upgrade setup 1 ->O.K.
clu_upgrade preinstall -> O.K.
then a lead member (node 1) single user mode:
bcheckrc
dupatch (in the untarred patch directory).
clu_upgrade postinstall -> o.k.
The next step was to roll member 2:
(single user mode, member 2):
bcheckrc
clu_upgrade roll
here I got a huge number of failures:
grep: can't open ./usr/.smdb./OSFPAT00009000505.inv
grep: can't open ./usr/.smdb./OSFPAT00001500505.inv
grep: can't open ./usr/.smdb./OSFPAT00001900505.inv
grep: can't open ./usr/.smdb./OSFPAT00003500505.inv
grep: can't open ./usr/.smdb./OSFPAT00004100505.inv
grep: can't open ./usr/.smdb./OSFPAT00004200505.inv
grep: can't open ./usr/.smdb./OSFPAT00005000505.inv
grep: can't open ./usr/.smdb./OSFPAT00006400505.inv
grep: can't open ./usr/.smdb./OSFPAT00006500505.inv
grep: can't open ./usr/.smdb./OSFPAT00006800505.inv
grep: can't open ./usr/.smdb./OSFPAT00007900505.inv
grep: can't open ./usr/.smdb./OSFPAT00009800505.inv
...
(from clu_upgrade.log)
The "505"-ending patches seems to be nowhere. As a side effect,
the roll wanted no kernel build and stoped very soon claiming that
the roll was finished. After the (needed) reboot, the member2 wasn´t
able to mount the cluster file systems beyond cluster root (stated
I/O error on all other file systems). I made a shutdown of both nodes
and started member 2 first into single user mode; bcheckrc and now it
got the file systems;( afterwards a boot of node 1 was also o.k.). Now it
seemed that this was now a general rule, member 2 must have booted
first to boot into the cluster.
So I decided to unroll everything:
(member 2, single user mode):
clu_upgrade undo roll -> O.K.
member 1)
clu_upgrade undo postinstall -> O.K.
(single user mode, after bcheckrc
dupatch and deletion of all patches.
clu_upgrade undo preinstall -> O.K.
clu_upgrade undo setup 1 -> message
o you want to continue to undo this stage of the upgrade? [yes]:
*** Error ***
All members are NOT at the same Base software version.
*** Error ***
All members are NOT at the same TruCluster software version.
and it doesn´t matter if the member 2 run on tagged or not on tagged files
(I tried
with clu_upgrade tagged disable/enable 2)
>From this point there is obviously no return possible.
So unfortunately I tried to patch with a freshly untarred patch kit again:
clu_upgrade preinstall -> O.K.
(single user mode member 1)
dupatch -> everything was installed and the kernel build.
B U T: after the reboot now member 1 got the I/O errors on the
cluster file systems (not the root file system). Here I tried with a few
kernels
(old one, genvmunix several versions...): into single user mode, any attempt
(bcheckrc) to mount the cluster file systems failed. Now (surely my fault) I
thought it would be better to work with the backup in
/var/.clu_upgrade/backup.member1.tar
and restored it from member2 (the member 1 boot_partition was also mounted -
this worked
any time). Now the situation is:
member2 is running on tagged files
member 1 (with any available kernel) can´t boot successful into single user
mode (nor in multiuser mode),
I see from member2 that the member1 boot_partition is mounted, but all I get
from the boot
sequence of member1 are kernel messages ending in:
clsm: initiated
vm_swap_init: swap is set to ...
CNX QDISK: Successfully claimed quorum disk, adding 1 vote
(and then hang of member 1...)
Up to this point any kernel messages seem to be quite normal for the
cluster.
the clu_get_info claims the member are UP
Any hints ...?
Regards,
Rudi Gabler
Received on Mon Mar 05 2001 - 16:58:54 NZDT