SUMMARY: Tru64 V5.1 patch 1 problem

From: Rudolf Gabler <rug_at_usm.uni-muenchen.de>
Date: Mon, 05 Mar 2001 17:56:58 +0100

Thank you to

        Dan Goetzman and Harald Baumgartner

Dan pointed out that the errors during the clu_upgrade roll is documented
in the release notes (yes it`s always good to read them before).

I found that the main problem (.. member1 is not booting int single user
mode)
was a fake: after the kernel startet it connects to the console by the
filesystem; somehow console was set not to 0 0 (but to 62 3 I don't know
why).
The failure that member1 couldn't boot the rest of the cluster file system
remained,
so I started member 1 first, did the patch again, rolled member2 and
managed to get to clean status. Now I could remove and add member1 from the
cluster
and everything works like a charm now.

Regards,

R.G.

-----Ursprüngliche Nachricht-----
Von: tru64-unix-managers-owner_at_ornl.gov
[mailto:tru64-unix-managers-owner_at_ornl.gov]Im Auftrag von Rudolf Gabler
Gesendet: Donnerstag, 1. März 2001 08:29
An: Tru64-Unix-Managers (E-Mail)
Betreff: Tru64 V5.1 patch 1 problem


Dear all,

I´ve a 2-node TruCluster V5.1 and run into the following mess:

I tried to patch the cluster with t64v51as0002-20001204 and did the
following steps

clu_upgrade check setup 1 ->O.K.
clu_upgrade setup 1 ->O.K.
clu_upgrade preinstall -> O.K.

then a lead member (node 1) single user mode:
bcheckrc
dupatch (in the untarred patch directory).

clu_upgrade postinstall -> o.k.

The next step was to roll member 2:

(single user mode, member 2):
bcheckrc
clu_upgrade roll

here I got a huge number of failures:

grep: can't open ./usr/.smdb./OSFPAT00009000505.inv
grep: can't open ./usr/.smdb./OSFPAT00001500505.inv
grep: can't open ./usr/.smdb./OSFPAT00001900505.inv
grep: can't open ./usr/.smdb./OSFPAT00003500505.inv
grep: can't open ./usr/.smdb./OSFPAT00004100505.inv
grep: can't open ./usr/.smdb./OSFPAT00004200505.inv
grep: can't open ./usr/.smdb./OSFPAT00005000505.inv
grep: can't open ./usr/.smdb./OSFPAT00006400505.inv
grep: can't open ./usr/.smdb./OSFPAT00006500505.inv
grep: can't open ./usr/.smdb./OSFPAT00006800505.inv
grep: can't open ./usr/.smdb./OSFPAT00007900505.inv
grep: can't open ./usr/.smdb./OSFPAT00009800505.inv
...

(from clu_upgrade.log)

The "505"-ending patches seems to be nowhere. As a side effect,
the roll wanted no kernel build and stoped very soon claiming that
the roll was finished. After the (needed) reboot, the member2 wasn´t
able to mount the cluster file systems beyond cluster root (stated
I/O error on all other file systems). I made a shutdown of both nodes
and started member 2 first into single user mode; bcheckrc and now it
got the file systems;( afterwards a boot of node 1 was also o.k.). Now it
seemed that this was now a general rule, member 2 must have booted
first to boot into the cluster.
So I decided to unroll everything:

(member 2, single user mode):

clu_upgrade undo roll -> O.K.

member 1)
clu_upgrade undo postinstall -> O.K.
(single user mode, after bcheckrc
dupatch and deletion of all patches.

clu_upgrade undo preinstall -> O.K.

clu_upgrade undo setup 1 -> message

o you want to continue to undo this stage of the upgrade? [yes]:

*** Error ***
All members are NOT at the same Base software version.

*** Error ***
All members are NOT at the same TruCluster software version.

and it doesn´t matter if the member 2 run on tagged or not on tagged files
(I tried
with clu_upgrade tagged disable/enable 2)

>From this point there is obviously no return possible.

So unfortunately I tried to patch with a freshly untarred patch kit again:
clu_upgrade preinstall -> O.K.
(single user mode member 1)
dupatch -> everything was installed and the kernel build.

B U T: after the reboot now member 1 got the I/O errors on the
cluster file systems (not the root file system). Here I tried with a few
kernels
(old one, genvmunix several versions...): into single user mode, any attempt
(bcheckrc) to mount the cluster file systems failed. Now (surely my fault) I
thought it would be better to work with the backup in
/var/.clu_upgrade/backup.member1.tar
and restored it from member2 (the member 1 boot_partition was also mounted -
this worked
any time). Now the situation is:

member2 is running on tagged files
member 1 (with any available kernel) can´t boot successful into single user
mode (nor in multiuser mode),
I see from member2 that the member1 boot_partition is mounted, but all I get
from the boot
sequence of member1 are kernel messages ending in:

clsm: initiated
vm_swap_init: swap is set to ...
CNX QDISK: Successfully claimed quorum disk, adding 1 vote

(and then hang of member 1...)

Up to this point any kernel messages seem to be quite normal for the
cluster.
the clu_get_info claims the member are UP


Any hints ...?


Regards,

Rudi Gabler
Received on Mon Mar 05 2001 - 16:58:54 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT