SUMM: Shutdown -r from multi-user mode hangs under DU 3.2D-1

From: Guy Dallaire <dallaire_at_total.net>
Date: Thu, 03 Apr 1997 09:46:30

Thanks to all who replied and sorry for the late summary.

Most of you mentionned that shutdown -r should not be used, because it does
not go throught the transition scripts, I already knew this. The problem I
had was that it HANGS at a blue screen when run from multi-user mode. It
only does this from my alpha 2100's, with the little alpha server 400, it
runs just fine.

Anyway, I will stop using it and use 'init 0' or shutdown to single user
mode before rebooting the system. I still wonder why the admin
documentation is always telling you to shutdown -r your system tough...

Here is the original post:

---------------------------------------------------------------------------
Hello,

I've noticed a problem with the 'shutdown -r' command in DU 3.2D-1. The
problems only occurs on my 2100's, it is OK on my alpha server 400 running
the same version of DU.

The only notable difference between the 2 machines (apart from capacity,
etc...) is that the 2100's are using ADVFS everywhere (root fs + all) and
the AlphaServer is using UFS everywhere.

I also noticed that the shutdown -h and/or shutdown -r commands do not seem
to run the rc0 shutdown scripts. Is this true ? It just seems to broadcast
kill signals to all process. This can be dangerous with an Oracle Database
Around.

>From time to time, you install stuff or read admin books and they tell you
to shutdown and restart the system with the 'shutdown -r' command. It just
does not work on my 2100's, I have to shutdown, remount the filesystems,
sync twice and from single user, shutdown -r.

Here is the symptom (assume you are running in the GUI from the start)

a) Shutown -r now

The systems goes down to the blue screen, seems to hang, and nothing else
happens.

b) If you first go down to single user mode (shutdown now) and then issue
the shutdown -r now command, it jams. The only way I could reboot was to
shut down to single user, run bcheckrc, sync twice and shutdown -r now. I
think it is a problem with the shutdown command not being able to write to
wtmp or something like this, but it should not hang, should it ?

If anyone has a fix, please let me know.

Thanks !

The answers:

----------------------------------------------------------------------------

The only time I have had a problem with our 2100 not shutting down
was when I accidently deleted the /var symbolic pointer in the /
directory.

On our system, /var is a symbolic link to /usr/var. When my pointer
was deleted, shutdown couldn't find /var/tmp and kept aborting.

After /var was recreated to point to /usr/var, shutdown worked fine.

Don't know if this helps at all.... (We are running 3.2c)
George Gallen
ggallen_at_slackinc.com

----------------------------------------------------------------------------

Off the top of my head, it sounds like there is an Oracle process
(or something else) that is waiting on I/O.

I suggest that try shutting down Oracle first, then try the shutdown -r.

Btw, the kill behavior of shutdown is in the man page.

Ray

----------------------------------------------------------------------------

We have no problems on our 2100 or 2100A, now on 3.2g formerly on 3.2d-1.
Likely a configuration or firmware issue there.
If you get the blue screen, it probably has really shutdown and just isn't
rebooting... do you have console set to graphics vs. serial?
If it were serial, it would stop on the blue screen and any
messages would show on the serial console.

>I also noticed that the shutdown -h and/or shutdown -r commands do not seem
>to run the rc0 shutdown scripts. Is this true ? It just seems to broadcast
>kill signals to all process. This can be dangerous with an Oracle Database
>Around.

I believe this is correct, and it certainly is dangerous.
We customized our system to start Oracle under 'init 4', changed
rc3.d to issue the appropriate stops from 4->3. And added a ua_boot
command (which is what the operators have sudo access to) to cleanly
shutdown the systems (it does the quiescing of applications
prior to issuing the shutdown).

--- Removed some stuff from the message here ---

>From time to time, you install stuff or read admin books and they tell you
>to shutdown and restart the system with the 'shutdown -r' command.

Of course they don't tell you to quiesce your system first prior to
issuing the shutdown command, :-)

As I stated initially, we don't see this behaviour.... my initial
guess would be a firmware problem. I believe I did once hear of
something comparable with a 2100 which had used the NT ecu diskette
vs. the Unix. Surprisingly it booted, but their graphics console
basically didn't work at all (>>> comands only) since the vga
adapter is an eisa option. k

----------------------------------------------------------------------------

We had problems with some of our DEC's, and one of the senior
DEC guy's saw one of our admins doing a shutdown -r now, and told him
that he should not do that! He should have been using init 0, which
runs the stops thru init.d tehn returns you to the boot prompt - too
bad if your not on the console, or an init s then a shutdown -r now

Too bad they changed the functionality of shutdown so radically from
Ultrix without really telling us! :(

I hope this helps

Brenden

----------------------------------------------------------------------------

I don't know if I can be of much help with your larger problem, but I can
say that if you read the man page on shutdown, it will clearly state that
either the -r or -h options DO NOT invoke the shutdown scripts in the
appropriate transition directories. I have used the "init 0" command to
degrade the system - this command does, in fact, invoke the correct transition
directories shutdown scripts. In other flavours of UNIX, the -h and/or -r
shutdown options do function as expected. However, in Digital UNIX (really
an OSF requirement) they simply don't. Many users recognizing this will
write their own scripts to explicitly degrade/quiese their databases, prior
to performing the shutdown with either the -h or -r option.

Hope this helps with part of your problem.

Meilleur souhaits et bonne chance,

Merv Weis
e-mail: mervyn.weis_at_alf.mts.dec.com

------------------------------------------------------------------------------

This is true : shutdown does NOT execute the rc0.d scripts : it just
 sends a "broadcast kill" to all the processes (cf man shutdown).
I think his is very dirty way to stop a machine and I get problems
 with ingres because of that.
The alternative to use init 0 instead seems good but you have to
 debug all the /sbin/init.d scripts which have been probably very rarely
 used to stop a machine ( for example the script inetd can loop
indefinetively tying to kill inetd )

I actually hesitate between two choices :
1/ make a script which tries init 0 and after a certain delay sends
    a shutdown in the case where init 0 hangs.
2/ make a script which stops the sensitive softwares (databases) and
  after that sends a shutdown.

Remember also that neither rc3 nor rc0 write their output elsewhere
 than on the console so you can't catch what is wrong in your startup
 or stop of the system.
I have modified /sbin/rc0 and /sbin/rc3 so that they write all their
 messages in a /var/log/rc3.log and /var/log/rc0.log , I keep also the
 previous log in /var/log/rcX.log.previous.I can send you those modifs
 if you are interested.

This doesn't respond to all the questions but I hope this could help.

----------------------------------------------------------------------------
----
Guy Dallaire
dallaire_at_total.net
"God only knows if god exists"
Received on Thu Apr 03 1997 - 17:58:18 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT