A couple weeks ago I posted this:
I have had an ongoing problem with my backups not completing. I'm doing
backups at 4:30 AM, with the system still up. This machine is our web /
dns / mail server so there are no shell users logged in.
Following is the script that I run every night as root, via cron, and a
sample of the type of error I'm getting. I've tried cleaning the tape
drive. Is there something wrong with my script?
---------- Script I use for backup ----------
# full_nightly_backup.sh - backup both filesystems to tape
cd /
# Spit out stdio / stderr to be mailed to administrator each night
echo "Beginning backup for:"
date
# Commit disk writes
/usr/sbin/sync
# Bring tape drive online and rewind tape
/usr/bin/mt -f /dev/nrmt0h online
/usr/bin/mt -f /dev/nrmt0h rewind
# Backup / and /usr and /usr2 filesystems
/usr/sbin/dump -0uf /dev/nrmt0h /
/usr/sbin/dump -0uf /dev/nrmt0h /usr
/usr/sbin/dump -0uf /dev/nrmt0h /usr2
# Take tape drive offline
/usr/bin/mt -f /dev/nrmt0h rewoffl
# Audi
echo "Backup completed for:"
date
---------- Results of backup ----------
Beginning backup for:
Fri Feb 14 03:45:00 EST 1997
dump: Dumping from host mufasa.herald-mail.com
dump: Date of this level 0 dump: Fri Feb 14 03:45:22 1997 EST
dump: Date of last level 0 dump: the start of the epoch
dump: Dumping /dev/rrz0a (/) to /dev/nrmt0h
dump: Mapping (Pass I) [regular files]
dump: Mapping (Pass II) [directories]
dump: Estimate: 35023 tape blocks on 0.08 volume(s)
dump: Dumping (Pass III) [directories]
dump: Volume 1, tape # 0001, begins with blocks from i-node 2
dump: Dumping (Pass IV) [regular files]
dump: 1.43% done -- finished in 00:03
dump: Actual: 35087 tape blocks on 1 volume(s)
dump: Feet remaining on tape: 2119
dump: Volumes used: 1
dump: Level 0 dump on Fri Feb 14 03:45:22 1997 EST
dump: Dump completed at Fri Feb 14 03:46:20 1997 EST
dump: Dumping from host mufasa.herald-mail.com
dump: Date of this level 0 dump: Fri Feb 14 03:46:20 1997 EST
dump: Date of last level 0 dump: the start of the epoch
dump: Dumping /dev/rrz0g (/usr) to /dev/nrmt0h
dump: Mapping (Pass I) [regular files]
dump: Mapping (Pass II) [directories]
dump: Estimate: 491413 tape blocks on 1.10 volume(s)
dump: Dumping (Pass III) [directories]
dump: Volume 1, tape # 0001, begins with blocks from i-node 2
dump: 0.10% done -- finished in 00:49
dump: Dumping (Pass IV) [regular files]
dump: 35.95% done -- finished in 00:08
dump: Write error -- wanted to write: 10240, only wrote: -1
slave_work(): write(): I/O error
dump: Write error - /dev/nrmt0h, volume 1, 1261 feet -- cannot recover
dump: Cannot fopen /dev/tty for reading
query(): fopen(): No such device or address
dump: Cannot remove shared memory
remove_shared_memory(): shmctl(): Invalid argument
dump: SIGTERM received -- Try rewriting
dump: Unexpected signal -- cannot recover
dump: Dumping from host mufasa.herald-mail.com
dump: Date of this level 0 dump: Fri Feb 14 03:53:17 1997 EST
dump: Date of last level 0 dump: the start of the epoch
dump: Dumping /dev/rrz0h (/usr2) to /dev/nrmt0h
dump: Mapping (Pass I) [regular files]
dump: Mapping (Pass II) [directories]
dump: Estimate: 56418 tape blocks on 0.13 volume(s)
dump: Dumping (Pass III) [directories]
dump: Volume 1, tape # 0001, begins with blocks from i-node 2
dump: Dumping (Pass IV) [regular files]
dump: 0.89% done -- finished in 00:03
dump: Write error -- wanted to write: 10240, only wrote: -1
slave_work(): write(): I/O error
dump: Write error - /dev/nrmt0h, volume 1, 20 feet -- cannot recover
dump: Cannot fopen /dev/tty for reading
query(): fopen(): No such device or address
dump: Cannot remove shared memory
remove_shared_memory(): shmctl(): Invalid argument
dump: SIGTERM received -- Try rewriting
dump: Unexpected signal -- cannot recover
/dev/nrmt0h: No such device or address
Backup completed for:
Fri Feb 14 03:54:12 EST 1997
------ Here are the responses I got -----
>From Irene A. Shilikhina <irene_at_alpha.iae.nsk.su>:
your tape seems to be of unsufficient length. Look at 25-th line of your
listing (Estimate:491413 tape blocks on 1.10 volume(s).
(According to the manual and the back of the tape I should be able to
fit 4 gigabytes on a 120M tape, which is what I'm using. I'm only
trying to back up about 550 MB)
>From Brian Sheehan <sheehan_at_scripps.edu>:
I got the same "shared memory" errors you reported for an almost
identical setup. If I'm guessing right this setup of yours worked a
couple of times before going insane? That's what happened to me. In my
case it was fixed by simply making sure the last time I wrote to tape I
used the "rewind-after-use" device. I also had to do a dump to the
rewind device one time to get everything working. I know it sounds
crazy but it's been fine ever since (about 18 months now).
(I tried it and it didn't work)
>From Rich Kulawiec <rsk_at_itw.com>:
It doesn't look like there is -- it looks like either (a) the tape
is full or (b) it's a cheap tape and is generating write() errors.
I'd say that it's worth doing a back-of-the-envelope calculation
based on the size/density of your tape to see if (a) is possible;
if not, then it's probably (b) and you should give it a shot with
the best tapes you can buy -- in the case of 8mm, that'd be Exabyte.
I'm not sure who makes the best 4mm, but I've had good luck with
Denon data-grade tapes.
(I've tried all kinds of tapes from all kinds of manufacturers, in
various lengths. Even the tape that came with the drive doesn't work.)
>From alan_at_nabeth.cxo.dec.com:
Generally speaking I/O errors are exactly what they claim
to be; failure for an input/output request to complete.
Check the system error log to see if will offer any clue
what the I/O errors are. Before V4 use the uerf(8) command
with the option "-o full" to get the full error log listing.
On V4 and later use dia(8), which may also have an "-o full"
option.
The other errors are the result of dump trying to clean
up from the I/O error. It tries to query the user by
writing and reading from /dev/tty, which cron doesn't
provide and the clean up code for that failure, could
have bugs in it. But, it all started with the I/O errors.
(Wow. I ran uerf and found a whole slew of CAM SCSI errors. Because
this message is so long already I'm going to post a separate message
asking about them.)
Thanks to all who helped.
- Derrick
Received on Mon Mar 03 1997 - 16:32:20 NZDT