Many thanks to:
Nikola Milutinovic
Steffi Laurentius
Rasal Kumarage
Brenden Phillips
Philip Ordinario
Octave Orgeron
Thomas M. Payerl
William H. Magill
Allan J Simeone
For their replies and suggestions.
There was, apparently, a discussion on this topic a while ago which I
missed. My apologies for not searching the archives. The conclusion
was that 'init 0' is the only likely reliable way to halt a system,
and this, indeed, seems to be the case for me. I tried the following:
1. NFS-mount remote disk with option 'hard'.
Start NFS transfer (tar).
Unplug the network connection.
sync; sync; sync; halt
The 'halt' process hung, but ran to completion
when the network was reconnected.
2. Repeat 1. with 'soft' mount
The 'halt' still hung.
3. Repeat 1. with 'halt -q' (quick halt).
The 'halt' suceeded, but on the subsequent reboot,
fsck decided that the disks hadnt been umounted
properly, and thus checked them all.
4. Repeat 1, but use 'init 0' instead of 'halt'.
Bingo. The shutdown ran to completion, and
on subsequent reboot, fsck was happy that
disks had been properly umounted, and thus
didnt need to check them. Furthermore, this
identical procedure works on SGI and Linux
boxes, and can be issued remotely by
rsh remotenode "init 0" < /dev/null &
By backgrounding the command, a whole series of
rsh's can be issued without fear of one of them
stalling because a remote node is down.
I'm now a very happy chappie.
Cheers,
Terry.
>Original message:
>Hi Managers,
>
>I'm trying to develop a reliable automatic shutdown procedure for a
>group of Alphas connected to a UPS. The command host gets the signal
>from the UPS that power has failed, and then proceeds to issue
>commands to the other hosts, using rsh, to shut them down.
>The command host then shuts iself down.
>This procedure generally works, but occasionally a host will
>fail to shutdown properly, sometimes even the command host doesnt
>do so.
>
>I've had shutdown problems on and off ever since I've been using OSF,
>(6+ years) and I've never got to the bottom of it. I now use
>'sync; sync; sync; halt' after warning users of impending shutdown,
>which seems to succeed more often than 'shutdown -h' (is this dangerous?)
>but occasionally, even that hangs.
>I wondered whether the attempt of 'shutdown' to halt all processes
>was hanging due to some outstanding NFS transfer, since there may well
>be disks NFS-mounted by the UPS-supplied hosts which are attached to
>machines which may not be up at the time the shutdown command is issued.
>
>Does anyone know of a rock-solid method of achieving a guaranteed
>clean shutdown which will proceed without hanging?
>
>What are the definitive steps that take place during a 'shutdown -h'
>and a 'halt'? The man pages are a bit vague. For instance, at which point
>are disks umounted (if at all) and what happens if some process which
>wont die, has a file open on one of the disks. Does the umount stall?
Received on Tue Dec 19 2000 - 14:49:06 NZDT