SUMMARY: shutdown problems

From: Horsnell T. <tsh_at_mrc-lmb.cam.ac.uk>
Date: Tue, 19 Dec 2000 14:47:57 +0000 (GMT)

Many thanks to:

Nikola Milutinovic
Steffi Laurentius
Rasal Kumarage
Brenden Phillips
Philip Ordinario
Octave Orgeron
Thomas M. Payerl
William H. Magill
Allan J Simeone

For their replies and suggestions.

There was, apparently, a discussion on this topic a while ago which I
missed. My apologies for not searching the archives. The conclusion
was that 'init 0' is the only likely reliable way to halt a system,
and this, indeed, seems to be the case for me. I tried the following:

1. NFS-mount remote disk with option 'hard'.
   Start NFS transfer (tar).
   Unplug the network connection.
   sync; sync; sync; halt
   The 'halt' process hung, but ran to completion
   when the network was reconnected.

2. Repeat 1. with 'soft' mount
   The 'halt' still hung.

3. Repeat 1. with 'halt -q' (quick halt).
   The 'halt' suceeded, but on the subsequent reboot,
   fsck decided that the disks hadnt been umounted
   properly, and thus checked them all.

4. Repeat 1, but use 'init 0' instead of 'halt'.
   Bingo. The shutdown ran to completion, and
   on subsequent reboot, fsck was happy that
   disks had been properly umounted, and thus
   didnt need to check them. Furthermore, this
   identical procedure works on SGI and Linux
   boxes, and can be issued remotely by
 
   rsh remotenode "init 0" < /dev/null &

   By backgrounding the command, a whole series of
   rsh's can be issued without fear of one of them
   stalling because a remote node is down.

I'm now a very happy chappie.

Cheers,
Terry.


>Original message:
>Hi Managers,
>
>I'm trying to develop a reliable automatic shutdown procedure for a
>group of Alphas connected to a UPS. The command host gets the signal
>from the UPS that power has failed, and then proceeds to issue
>commands to the other hosts, using rsh, to shut them down.
>The command host then shuts iself down.
>This procedure generally works, but occasionally a host will
>fail to shutdown properly, sometimes even the command host doesnt
>do so.
>
>I've had shutdown problems on and off ever since I've been using OSF,
>(6+ years) and I've never got to the bottom of it. I now use
>'sync; sync; sync; halt' after warning users of impending shutdown,
>which seems to succeed more often than 'shutdown -h' (is this dangerous?)
>but occasionally, even that hangs.
>I wondered whether the attempt of 'shutdown' to halt all processes
>was hanging due to some outstanding NFS transfer, since there may well
>be disks NFS-mounted by the UPS-supplied hosts which are attached to
>machines which may not be up at the time the shutdown command is issued.
>
>Does anyone know of a rock-solid method of achieving a guaranteed
>clean shutdown which will proceed without hanging?
>
>What are the definitive steps that take place during a 'shutdown -h'
>and a 'halt'? The man pages are a bit vague. For instance, at which point
>are disks umounted (if at all) and what happens if some process which
>wont die, has a file open on one of the disks. Does the umount stall?
Received on Tue Dec 19 2000 - 14:49:06 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT