SUMMARY: user cron jobs in a disk service

From: Scott Mutchler <smutchler_at_gfs.com>
Date: Wed, 26 Aug 1998 15:39:48 -0400

Thanks to all who responded:

Steven <johnson_at_bayflash.stpt.usf.edu>
"Randall R. Cable" <randy.cable_at_mci.com>
chakru <kcpani_at_cob.cummins.com>
"Matt J. L. Goebel" <goebel_at_emunix.emich.edu>
"Bivins, Jeff" <BIVINS_at_nebeng.otis.com>
Chris Jankowski - mailing lists <chrisj_at_lagoon.meo.dec.com>


Your replies all fell into a couple of categories as follows:

(1) have the jobs set up to run on each node but include some logic to test if the disk service is on the node. If on the node, run the job; if not, exit (don't run the job). Variations on this theme included (a) logic in the crontab entry; (b) logic in the script run from cron. This could include checking for a flag file or some other indicator the disk service is on the node.

(2) Another suggestion was to dump (crontab -l username) and reload cron entries as part of the stop and start action scripts for disk service failover. The caution there is to be sure to dump the files to a fileset that is part of the disk service so they can be reloaded on the node starting the disk service.

(3) Another suggestion was to run the jobs on both nodes and ignore the failures from the node that does not have the service.

(4) The last suggestion was to set up /usr/spool/cron/crontabs as a symbolic link that is part of an NFS share.

My original post was:

We have two 4100's running DU 4.0d (with patch kit 2) and TruCluster 1.5 (with patch kit 2). We have a disk service that runs a database (Progress) and makes use of four AdvFS filesets. We have user accounts with home directories in one of those filesets that need to run cron jobs to take action against the database, such as loading data and running reports.

When the disk service moves to a node, cron on that node needs to "wake up" and see that it now has jobs to do for these users. The cron on the other node needs to wake up and see that it should no longer be running those jobs. We have discovered that cron is different from most daemons in that kill -1 does not cause it to re-examine its files (in /var/spool/cron/crontabs). We are handling this in our failover scripts (the user-defined stop action and start action scripts).

We also tried the variation of completely killing and restarting cron. This works well once the nodes are booted. However, when a node boots, it runs the stop action script. Our stop action script would look for a cron to kill, adjust crontab files, then start a new cron. The problem comes in at /sbin/rc3.d/S57cron, which in turns start up another cron daemon.

So my question: how have others of you handled cron jobs in your clusters that need to move with a disk service?

Thank you

Scott Mutchler
Gordon Food Service Marketplace
smutchler_at_gfs.com

------
Detailed responses from each person follow.


You have an interesting approach to handling this cron dilemna. I may
explore your option. Here is how we handle it:


/usr/spool/cron/crontabs is symbolically linked to an NFS share that both
nodes can see at all times.

We have scripts that edit the cron file on one node. Once you write out
the cron file, both nodes have their cron files refreshed. i.e. Both
nodes have the same exact cron table at all times.

The processes that run from cron then check to see if the ASE service is
available on that node. If Node A has the service, the process will
continue unimpeded. When Node B checks for the ASE service, it will fail
and gracefully terminate.

_________________________________________________________________________
Steven Johnson johnson_at_stpt.usf.edu
http://www.bsn.usf.edu/~sjohnson (813) 413-2286
_________________________________________________________________________

Scott,
We simply run the same cron job on both nodes and ignore the failed cron
job entries from the node not associated with the disk service...

Hope this helps?

peace!
rrc


I use crontab commands in the following way. I have in my ase start script the following lines.

                        crontab -l > /tmp/cronnow
                        cat /usr/asecron/croncommon >> /tmp/cronnow
                        crontab /tmp/cronnow

My ase stop script looks like this.

                        crontab -l > /tmp/cronnow
                        crontab /usr/asecron/cronnorm

                    The croncommon has got crontab to be run when the service is running on the system.
                    The cronnorm has got crontab to be run when the service is not running on the system.

                    This works pretty good and also you can change the croncommon and cronnorm any day without actually going to the start and stop scripts of ase.

                    See whether this helps you.

                                    Regards, Chakru


Hello,

  I took the dirty way out and in in script with cron calls I check for the
following file : '/var/ase/tmp/"$SERVICE_NAME"_IS_RUNNING'. If it exists,
then the service is running, if not then it isn't. I'm running ase 1.4, and
I hope they haven't change that in 1.5.

Matt Goebel

ello,

You might try to look at this problem from a different perspective. The cron
services may not need to fail over. You might be able to check for the
existence of the files on the machine before starting the jobs. If the files
do not exist, exit the script that is being called by cron.

Jeff

Scott,

What about having those cron jobs running on all members all the time but
having logic which checks if the required service is running locally
and then taking an appropriate action.

This is very simple and reasonably foolproof except for the race conditions
during failover.

Regards,

Chris

 +-+-+-+-+-+-+-+ Chris Jankowski - Open Systems Cons.- chris_at_lagoon.meo.dec.com
 |d|i|g|i|t|a|l| Digital Equipment Corporation (Australia) tel.+61 3 92753622
 +-+-+-+-+-+-+-+ 564 St. Kilda Rd, Melbourne 3004, AUSTRALIA fax +61 3 92753453

Entities should not be multiplied needlessly. - William of Ockham, razor of


Again, Thanks to everyone.

Scott Mutchler
Gordon Food Service Marketplace
Received on Wed Aug 26 1998 - 19:39:53 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT