Backup dies, but only from cron

From: Paul A Sand <pas_at_unh.edu>
Date: Mon, 08 Mar 1999 12:02:45 -0500

Hi --

Please excuse my long-winded explication of this problem. I'm totally
in the dark, here, so I'm not sure whether some odd fact might be
relevant.


Background
----------

I run backups of our Network Appliance file server nightly. A cron job
on our Tru64 Unix machine runs a Perl script, which in turn runs an rsh
to the file server's dump program, which dumps to the file server's
local DLT drive. The Perl script watches the output from the command,
waiting for the rsh to finish, then rewinds the tape, does various
bookkeeping chores, and exits itself.

The Tru64 Unix machine is currently running 4.0D with patch #002 installed.

Full backups are done weekly, incrementals the other six nights of the week.

This worked fine for awhile until...

Problem
-------

At one point in the past six months or so, this stopped working for the
FULL backups. The backup will run uneventfully on the file server for a
few hours but then (without any error message anywhere I can find), the
rsh exits early, reason unknown. The Perl script then tries to
continue, but the tape rewind fails with an error message about the
drive being offline. I wind up with a partial dump on the tape.
Unsatisfactory.

There's plenty of room on the tape. Incrementals continue to work fine.

The BIG mystery is that the fulls only seem to fail when CRON is in the
picture. If I run the Perl script from the command line, things seem
to work fine. <Please insert a mental picture of me beating my head
against my keyboard here.>

Complicating Detail
-------------------

Embarrasingly, I'm not sure when this problem started (don't ask), and
(therefore) I'm also unsure whether it could have been caused by (a) an
increase in disk usage on the file server, hence more time taken for
the backup, hence we ran up against some sort of pre-existing limit; or
(b) an OS upgrade/patch making something on the Digital Unix side work
differently.

Bottom Line
-----------

I guess the primary question is: Is this problem really caused by some
sort of Cron-vs.-Command Line difference on the Tru64 Unix side, and
(if so) what is it and can it be worked around?

Or am I missing something and the problem actually lies somewhere else?

-- 
-- Paul A. Sand                 | The reason this is so upsetting is
-- University of New Hampshire  | that all this crap worked
-- pas_at_unh.edu                  | before...
-- http://pubpages.unh.edu/~pas |     (Jeff Gunn)
Received on Mon Mar 08 1999 - 17:05:31 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT