SUMMARY: Non-deterministic startup scripts

From: Graham Allan <allan_at_physics.umn.edu>
Date: Wed, 23 Feb 2000 15:57:14 -0600

I got a lot of replies on this one. Many people had the same problem;
some had ideas about the cause for them. Suggestions include:

1) use of nohup in the script. I think I have tried this in the past,
in other startup scripts, without total elimination of all strangeness;
but I shall look at it again.

2) The Tru64 environment is different enough from, say, SunOS or Solaris
that some third party software simply doesn't work correctly.

3) Specifically for Altavista search. William H. Magill says below that
it can have problems restarting if it was not previously shut down
normally. And indeed this is true: on a normal system reboot, AV is not
shut down correctly, as Tru64 doesn't run the init scripts at this time
(this is corrected in 5.0 I think?).

4) Even if I am doing something braindead here, it seems that enough
other people are having the same troubles that these init scripts could
stand some better "how to" documentation (I *did* check the Tru64
online manuals and they only describe where the scripts are and what
they do).

Graham

Actual replies are given below; I've tried to excise all email
addresses, etc, in the name of spam prevention.

------------------------------------------------------------------------
> Does anyone else experience erratic behaviour with system startup
> scripts in /sbin/rc3.d? On some of my systems, running 4.0F pk2, some
> daemons *claim* to be started (the message goes by on the console) but
> they don't continue to run.

Try starting the daemon using the nohup command.
-- 
Bob Sloane, University of Kansas Computer Center, Lawrence, KS, 66045
------------------------------------------------------------------------
Graham-
I simply changed all of the joind's to dhcpd in the one included
with the os (and I added a restart section which basically kills
and invokes dhcpd).  The only real difference between yours and
the system one is the rcmgr calls and the ones to see if a process
is already running ... this may provide enough slop in the timing
to get it up and running.
Also, the OS based one is S56dhcp.
Also, I know that some people have found that a nohup helps them
survive the init.d script (might help with the altavista startup).
Not sure if this helps or not ...
S
-----------------------------------------------------------------------
Sean O'Connell                                
Institute of Statistics and Decision Sciences 
------------------------------------------------------------------------
From: "Naccarato, Robert"
Did you check dhcpd's logs?  I think you have to have the packetfilter set
up for it to work.
------------------------------------------------------------------------
You will probably want to put a "nohup" in front of your command like:
     nohup /usr/sbin/dhcpd &
which will prevent it from hanging up.  Thanks,
Dave Niska
US Bank
St. Paul, MN
------------------------------------------------------------------------
> Does anyone else experience erratic behaviour with system startup
> scripts in /sbin/rc3.d? On some of my systems, running 4.0F pk2, some
> daemons *claim* to be started (the message goes by on the console) but
> they don't continue to run. It doesn't seem to happen with system-provided
> scripts, but I don't see anything wrong with the scripts which fail. For
> example the script we have to start the ISC dhcp 2.0 server is:
We've been seeing similar things since at least 4.0B if not before, and still 
see it at 4.0D (PK5 I believe).  I'm not sure I'd call it non-deterministic, 
though.  It seems to happen every time for startup scripts that try to setuid 
to a different user, either via an 'su <username> -c blah blah blah' command 
in the script itself (such as our license-manager scripts do) or by doing it 
internally via a system call (as our AltaVista software appears to do).  The 
same scripts seem always to succeed when we run them manually as root once 
the system is up.  We've finally given up and just inserted commands into the 
startup scripts that email us to remind us to start them manually (sigh...).  
Sorry I can't offer any solutions, but if you hear of any way to fix it, we'd 
love to hear about it.
--
Bob Jones					Sr. Systems Manager
------------------------------------------------------------------------
As I've posted in response to similar queries before, see the list archives, the way init runs
the scripts in the rc3.d directory is that the child scripts are sent (two if I recall
correctly) HUP signals. HUP is traditionally used to tell Unix daemons to reread their
config files, but of course a programmer could use it for anything. Since if the programmer
did not provide a HUP handler, the default response is to terminate, this is what you see.
The programs work when run manually since you (manually) don't send a HUP after starting them. :-)
Solutions: a) rewrite the applications to handle HUP's gracefully.
           or     b) use a nohup at the part of the start script where you actually start the program.
Note that some public code does handle this stuff right; e.g., Samba handles HUP fine.
It is bizarre that the Digital search engine doesn't.
Oisin McGuinness
Sumitomo Bank Capital Markets
------------------------------------------------------------------------
YES!
I've been fighting this with my backup verification
script for months. The problem showed up after installing
pk3. I've given the Compaq Services UNIX Expert Team
a heads up that it's a problem, but haven't logged an
"official" call until I can determine where the problem
is. It's difficult to diagnose since it requires a reboot,
and, as you say, there's not much debugging at that level.
Alan Davis
------------------------------------------------------------------------
>    case "$1" in
>    'start')
>       if /usr/sbin/dhcpd; then
>          echo "Started dhcpd"
>          else
>          echo "Couldn't start dhcpd"
>          fi
>          ;;
>
While the script will run without error, one assumes that the missing 
line is a typo...
all it does is test for the existance of a file and then print out 
the words "Started dhcpd" - without starting the daemon.
Off hand I don't believe that we have modified the AV startup script,
but I don't know. AV can have problems re-starting if it was not "shut-down,
normally" but rather crashed-down (ie, cpu panic, or other hardware crash.)
This is how we drive the startup script as we have multiple instances of
AV running.
=======================<cut here>==========================================
#!/bin/sh
# ---------------------------------------------------------------------*
#       Make links in /sbin/rc3.d and rc2.d                             
#                                                                       
#   ln -s /usr/local/sbin/init.d/altavista /sbin/rc3.d/S96altavista             
#   ln -s /usr/local/sbin/init.d/altavista /sbin/rc2.d/K96altavista             
#   ln -s /usr/local/sbin/init.d/altavista /sbin/rc0.d/K96altavista             
#
# When you install any new AV index, an artifact of the AV install
# is to copy a new startup script /sbin/init.d/altavista but this script
# only starts the new index being built and wipes out any existing startups
# Copy this local altavista into /sbin/init.d/altavista
#
#   cp /usr/local/sbin/init.d/altavista /sbin/init.d/altavista               
# ---------------------------------------------------------------------*
## startup for Enterprise AltaVista Beta2
MODE=s          ## default mode is startup
if [ $# -gt 0 ]; then
  case "$1"
  in
    start) MODE=s;;
    stop) MODE=k;;
    *) echo "$0: unknown option: $1" exit 1;;
  esac
fi
( cd /usr/local/altavista/pennweb ; ./avsetup -$MODE )
( cd /usr/local/altavista/oncolink ; ./avsetup -$MODE )
( cd /usr/local/altavista/pennweb2 ; ./avsetup -$MODE )
( cd /usr/local/altavista/computing ; ./avsetup -$MODE )
( cd /usr/local/altavista/special ; ./avsetup -$MODE )
=======================<cut here>==========================================
-- 
                ===<Tru64 UNIX-SIG Chair>===
                     www.tru64unix.org
T.T.F.N.
William H. Magill                          Senior Systems Administrator
Information Services and Computing (ISC)   University of Pennsylvania
------------------------------------------------------------------------
To which I replied
>   No, this does work, really. Rather than checking for existance of a
>   file, it runs the daemon and checks the exit code.
------------------------------------------------------------------------
2nd followup:
------------------------------------------------------------------------
If you are getting an exit code, that implies that the daemon has
terminated.
If the daemon continues to run, you won't get an exit code. (And the next
script in the series won't run.)
And in all probablity, you will always get a "successful execution"
exit code from the "daemon" anyway. They rarely set an exit code that says 
oops something screwed up -- unless the program terminates abnormally.
I don't know which dhcp daemon you are running, but the classic problem
with a Sun oriented daemon is that the Tru64 environment is different.
I'm not a programmer, so I may not be explaining this correctly, but...
a Tru64 daemon has no parent, and as I recall, you therefore can't simply
fork a process and expect it to run after the "parent" init script
terminates. (ie the parent is NOT "1" /sbin/init, but the rc script.)
I've seen this problem before with people who are used to writing Daemons
that run in the Solaris or SunOS environment. They don't work under OSF/1.
There is something you have to do differently to convince a process that it
is going to run as a daemon.
-- 
                ===<Tru64 UNIX-SIG Chair>===
                     www.tru64unix.org
T.T.F.N.
William H. Magill                          Senior Systems Administrator
Information Services and Computing (ISC)   University of Pennsylvania
Received on Wed Feb 23 2000 - 21:58:06 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:40 NZDT