Question
> I have a system with a lot of different processes (Most of them are
> developed in c and c++).
>
> So far I have a script for each process that I run every fifth minute to
se
> if they are up or not (With a job sheduler).
> And then send me a alarm if a process is down.
>
> We have SNMP on few of them but if the process just dies we don't get any
> traps.
>
> How can I easily get notified when a process dies without these scripts?
(Dr. Tom Blinn, 603-884-0646 [tpb_at_doctor.zk3.dec.com])
This is UNIX. There is no general mechanism for a randomly selected
process to get notified when some other random process (that it not a
child of the first process) exits (dies). I would not be surprised if
there are some systems where this can be made to happen as a standard
kernel service, but UNIX isn't one of them.
There are some things other than your script that you could try, but
in general, what you're doing is what you need to be doing. Short of
having one process start up everything else and get notifications on
child process exits, there is no other way.
(Dyer, Steve J. [Steve.Dyer_at_alcoa.com])
One way would be to write a master (parent) process that starts all these
processes and includes a signal handler to handle the signals that occur
when these processes die. The parent process could just start the children
processes, then go into a hold state (via sleep), then break out of the hold
state when one of the children processes die.
(system administration account [sysadmin_at_astro.su.se])
Sounds rather similar to what Dan Bernstein does in his daemontools package.
Maybe you should have a look at that.
http://cr.yp.to/daemontools.html .
I'm thinking specifically of the "supervise" command, but you may find the
entire package useful. Also, this tool is designed to promptly restart any
service that has died; but you can probably work a notification mechanism
into it. Or cull the logs after the fact.
That said, if what you want is really instant notification rather than
automated recovery, perhaps you need to have the processes started by a
parent that handles SIGCHLD and sends you a message for each signal it gets.
Or you could program an alarm handler within each process, to periodically
send you a heartbeat of some kind, and have the receiver raise a flag
whenever no heartbeat was received in the last N seconds.
Which design is best really depends on your application.
Regards Klas Erlandsson
Received on Mon Nov 20 2000 - 14:45:50 NZDT