Hi,
Thanks to my support engineer Julio Marca at Compaq here in Sydney who managed
to track down the fix in a couple of hours.
This is a known bug, if you use "init 0" or "shutdown -hs" to shutdown a node
in the cluster, it will run the "stop" section of the rc scripts. This problem
doesn't occur with "shutdown -h" which does not run the "stop" section of the
rc scripts.
In the script "/sbin/init.d/lsm", in the stop section, there is a mistake
which causes the "vold" daemons on the other nodes to go into a disabled
state.
The fix didn't make Patch Kit 0002, but will be in Patch Kit 0003.
Meanwhile, the fix is:
on line 100:
/sbin/voldctl stop 2>/dev/null
should read:
/sbin/voldctl -k stop 2>/dev/null
thanks again to Julio.
gunther
My original posting:
>Hi,
>
>I'm experiencing a really strange problem.
>
>My setup:
>
>3 node cluster (2 nodes on a GS320, 1 node on another GS320)
>dual-HSG80 storage (multibus_failover enabled)
>dual-KGPSAs from each node
>dual-FX switches
>Tru64 UNIX 5.1 with Patch Kit 0002
>TruCluster 5.1
>
>The above setup has been working fine for 3 months now.
>
>The problem:
>
>In the last couple of days we've been having strange problems with
>CLSM.
>
>When I shutdown 1 node; the remaining two nodes' 'vold' daemon
>goes into a disabled state. Volumes are OK, but no LSM configuration
>is viewable ('volprint' complains about no diskgroups imported)
>
>If I manually run 'lsmbstartup' on a remaining node, everything is
>fine on that node, but the other remaining node's 'vold' dies. If I
>then run 'lsmbstartup' on it - everything is fine too.
>
>If I boot the node that was shutdown - everything remains ok. The
>other members vold's continue in an enabled state.
>
>If I shutdown the whole cluster and boot again, it all comes up OK.
>
>I've tried shutting down the other members, and get the same effect.
>
>The only change in the last week has been the replacement of a failed
>KGPSA on one node (they each have dual KGSPAs) - which was done
>the morning of the day we detected the problem - but they problem
>may have surfaced earlier and we didn't notice it (we hadn't created
>any LSM volumes for about 10 days).
>
>Has anyone seen this kind of behaviour? I'm completely stumped at
>this stage.
>
>Thanks in advance,
>gunther
Received on Thu Apr 05 2001 - 04:31:53 NZST