SUMMARY: k shell help from Andy Cohen on 2001-03-20 (tru64-unix-managers)

From: Andy Cohen <acohen_at_cognex.com>
Date: Mon, 19 Mar 2001 12:20:03 -0500

Hi,

I had originally asked for help in writing a script that would notify me via
e-mail if a process was not found to be running. I received lots of answers
-- many thanks to all of you! I sifted through through them all and came up
with the following approach:

export PSCHED_ID=$1
...
        if [[ `ps -ef | grep -v grep | grep -c "PSRUN$PSCHED_ID"` -eq 0 ]]
        then
                echo Failed to restart the refresh process scheduler !
            MSSG="ATTENTION! Failed to restart the $PSCHED_ID Process Scheduler
at $RESTART_TIME!"
            SUBJ="ATTENTION! Failed to restart the $PSCHED_ID Process Scheduler
at $RESTART_TIME!"
            echo $MSSG
                echo $SUBJ
                mailx -v -s "${SUBJ}" acohen_at_cognex.com <<-!
                ${MSSG}
                !
        else
                MSSG="ATTENTION! The $PSCHED_ID Process Scheduler was
restarted at $RESTART_TIME!"
                SUBJ="ATTENTION! The $PSCHED_ID Process Scheduler was
restarted at $RESTART_TIME!"
                echo $MSSG
                echo $SUBJ
                mailx -v -s "${SUBJ}" acohen_at_cognex.com <<-!
                ${MSSG}
                !

The key points for me that I was missing were:
1) enclose the ps command in backward ticks: `
2) the return value of a pipeline is the return value of the last command in
the pipeline
3) grep -c returns a count of the number of matched processes instead of
returning the entire ps -elf output line.

There were many variations on a theme but most were of this nature. The
only significant differences were:

==================
As an alternative, why don't you save the PID when the process starts,
then run a
        kill -0 $(cat /var/run/xxx.pid) 2>/dev/null || mail ...
? Should be cheaper than ps + grep, and also more reliable (you
won't be fooled by identically-named processes that do other things).
Got to run the kill as root or as the process owner, though (don't
worry: with -0 it's quite safe).

Also, it bypasses your ksh coding dilemma. (I believe there is a way
to do what you want, but am not in a mood for experimenting. Maybe
piping the result through "wc -l" and sending mail if that prints 0?)

If you can modify the daemon to periodically touch a file, you could
send the message based on whether the timestamp on that file is out
of date. (The find command can be used for this. Especially nice if
you have a lot of such processes.) That way you know not only that
the process is running, but also that it's running _normally_.
=========================

and

=========================
You may want to take a look at /usr/share/lib/shell/Wait. It's what a number
of OS utilities use to wait on a PID to exit before doing something. In your
case you would get the PID of your critical process and Wait on it. When it
exits Wait will trigger the restart.
====================

I did a 'man wait' and while it looked more 'robust' and probably a better
way to go it was a bit too much for me right now; maybe in the 'next
release'. ;-}

Thanks again everybody!

Andy Cohen
Database Systems Administrator
Cognex Corporation
1 Vision Drive
Natick, MA 01760-2059
voice: 508/650-3079
pager: 877-654-0447
cell: 508/294-0863
  fax: 508/650-3337
Received on Mon Mar 19 2001 - 17:21:29 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT