Hi,
I had originally asked for help in writing a script that would notify me via
e-mail if a process was not found to be running. I received lots of answers
-- many thanks to all of you! I sifted through through them all and came up
with the following approach:
export PSCHED_ID=$1
...
if [[ `ps -ef | grep -v grep | grep -c "PSRUN$PSCHED_ID"` -eq 0 ]]
then
echo Failed to restart the refresh process scheduler !
MSSG="ATTENTION! Failed to restart the $PSCHED_ID Process Scheduler
at $RESTART_TIME!"
SUBJ="ATTENTION! Failed to restart the $PSCHED_ID Process Scheduler
at $RESTART_TIME!"
echo $MSSG
echo $SUBJ
mailx -v -s "${SUBJ}" acohen_at_cognex.com <<-!
${MSSG}
!
else
MSSG="ATTENTION! The $PSCHED_ID Process Scheduler was
restarted at $RESTART_TIME!"
SUBJ="ATTENTION! The $PSCHED_ID Process Scheduler was
restarted at $RESTART_TIME!"
echo $MSSG
echo $SUBJ
mailx -v -s "${SUBJ}" acohen_at_cognex.com <<-!
${MSSG}
!
The key points for me that I was missing were:
1) enclose the ps command in backward ticks: `
2) the return value of a pipeline is the return value of the last command in
the pipeline
3) grep -c returns a count of the number of matched processes instead of
returning the entire ps -elf output line.
There were many variations on a theme but most were of this nature. The
only significant differences were:
==================
As an alternative, why don't you save the PID when the process starts,
then run a
kill -0 $(cat /var/run/xxx.pid) 2>/dev/null || mail ...
? Should be cheaper than ps + grep, and also more reliable (you
won't be fooled by identically-named processes that do other things).
Got to run the kill as root or as the process owner, though (don't
worry: with -0 it's quite safe).
Also, it bypasses your ksh coding dilemma. (I believe there is a way
to do what you want, but am not in a mood for experimenting. Maybe
piping the result through "wc -l" and sending mail if that prints 0?)
If you can modify the daemon to periodically touch a file, you could
send the message based on whether the timestamp on that file is out
of date. (The find command can be used for this. Especially nice if
you have a lot of such processes.) That way you know not only that
the process is running, but also that it's running _normally_.
=========================
and
=========================
You may want to take a look at /usr/share/lib/shell/Wait. It's what a number
of OS utilities use to wait on a PID to exit before doing something. In your
case you would get the PID of your critical process and Wait on it. When it
exits Wait will trigger the restart.
====================
I did a 'man wait' and while it looked more 'robust' and probably a better
way to go it was a bit too much for me right now; maybe in the 'next
release'. ;-}
Thanks again everybody!
Andy Cohen
Database Systems Administrator
Cognex Corporation
1 Vision Drive
Natick, MA 01760-2059
voice: 508/650-3079
pager: 877-654-0447
cell: 508/294-0863
fax: 508/650-3337
Received on Mon Mar 19 2001 - 17:21:29 NZST