SUMMARY: dump not working (still broken)

From: Derrick Miller <phuture_at_bigdog.fred.net>
Date: Mon, 3 Mar 1997 09:34:38 -0500 (EST)

A couple weeks ago I posted this:

I have had an ongoing problem with my backups not completing. I'm doing
backups at 4:30 AM, with the system still up. This machine is our web /
dns / mail server so there are no shell users logged in.

Following is the script that I run every night as root, via cron, and a
sample of the type of error I'm getting. I've tried cleaning the tape
drive. Is there something wrong with my script?

---------- Script I use for backup ----------

# full_nightly_backup.sh - backup both filesystems to tape

cd /

# Spit out stdio / stderr to be mailed to administrator each night

echo "Beginning backup for:"
date

# Commit disk writes

/usr/sbin/sync

# Bring tape drive online and rewind tape

/usr/bin/mt -f /dev/nrmt0h online
/usr/bin/mt -f /dev/nrmt0h rewind

# Backup / and /usr and /usr2 filesystems

/usr/sbin/dump -0uf /dev/nrmt0h /
/usr/sbin/dump -0uf /dev/nrmt0h /usr
/usr/sbin/dump -0uf /dev/nrmt0h /usr2

# Take tape drive offline

/usr/bin/mt -f /dev/nrmt0h rewoffl

# Audi

echo "Backup completed for:"
date

---------- Results of backup ----------

Beginning backup for:
Fri Feb 14 03:45:00 EST 1997
dump: Dumping from host mufasa.herald-mail.com
dump: Date of this level 0 dump: Fri Feb 14 03:45:22 1997 EST
dump: Date of last level 0 dump: the start of the epoch
dump: Dumping /dev/rrz0a (/) to /dev/nrmt0h
dump: Mapping (Pass I) [regular files]
dump: Mapping (Pass II) [directories]
dump: Estimate: 35023 tape blocks on 0.08 volume(s)
dump: Dumping (Pass III) [directories]
dump: Volume 1, tape # 0001, begins with blocks from i-node 2
dump: Dumping (Pass IV) [regular files]
dump: 1.43% done -- finished in 00:03
dump: Actual: 35087 tape blocks on 1 volume(s)
dump: Feet remaining on tape: 2119
dump: Volumes used: 1
dump: Level 0 dump on Fri Feb 14 03:45:22 1997 EST
dump: Dump completed at Fri Feb 14 03:46:20 1997 EST
dump: Dumping from host mufasa.herald-mail.com
dump: Date of this level 0 dump: Fri Feb 14 03:46:20 1997 EST
dump: Date of last level 0 dump: the start of the epoch
dump: Dumping /dev/rrz0g (/usr) to /dev/nrmt0h
dump: Mapping (Pass I) [regular files]
dump: Mapping (Pass II) [directories]
dump: Estimate: 491413 tape blocks on 1.10 volume(s)
dump: Dumping (Pass III) [directories]
dump: Volume 1, tape # 0001, begins with blocks from i-node 2
dump: 0.10% done -- finished in 00:49
dump: Dumping (Pass IV) [regular files]
dump: 35.95% done -- finished in 00:08
dump: Write error -- wanted to write: 10240, only wrote: -1
slave_work(): write(): I/O error
dump: Write error - /dev/nrmt0h, volume 1, 1261 feet -- cannot recover
dump: Cannot fopen /dev/tty for reading
query(): fopen(): No such device or address
dump: Cannot remove shared memory
remove_shared_memory(): shmctl(): Invalid argument
dump: SIGTERM received -- Try rewriting
dump: Unexpected signal -- cannot recover
dump: Dumping from host mufasa.herald-mail.com
dump: Date of this level 0 dump: Fri Feb 14 03:53:17 1997 EST
dump: Date of last level 0 dump: the start of the epoch
dump: Dumping /dev/rrz0h (/usr2) to /dev/nrmt0h
dump: Mapping (Pass I) [regular files]
dump: Mapping (Pass II) [directories]
dump: Estimate: 56418 tape blocks on 0.13 volume(s)
dump: Dumping (Pass III) [directories]
dump: Volume 1, tape # 0001, begins with blocks from i-node 2
dump: Dumping (Pass IV) [regular files]
dump: 0.89% done -- finished in 00:03
dump: Write error -- wanted to write: 10240, only wrote: -1
slave_work(): write(): I/O error
dump: Write error - /dev/nrmt0h, volume 1, 20 feet -- cannot recover
dump: Cannot fopen /dev/tty for reading
query(): fopen(): No such device or address
dump: Cannot remove shared memory
remove_shared_memory(): shmctl(): Invalid argument
dump: SIGTERM received -- Try rewriting
dump: Unexpected signal -- cannot recover
/dev/nrmt0h: No such device or address
Backup completed for:
Fri Feb 14 03:54:12 EST 1997

------ Here are the responses I got -----

>From Irene A. Shilikhina <irene_at_alpha.iae.nsk.su>:

your tape seems to be of unsufficient length. Look at 25-th line of your
listing (Estimate:491413 tape blocks on 1.10 volume(s).

(According to the manual and the back of the tape I should be able to
fit 4 gigabytes on a 120M tape, which is what I'm using. I'm only
trying to back up about 550 MB)

>From Brian Sheehan <sheehan_at_scripps.edu>:

I got the same "shared memory" errors you reported for an almost
identical setup. If I'm guessing right this setup of yours worked a
couple of times before going insane? That's what happened to me. In my
case it was fixed by simply making sure the last time I wrote to tape I
used the "rewind-after-use" device. I also had to do a dump to the
rewind device one time to get everything working. I know it sounds
crazy but it's been fine ever since (about 18 months now).

(I tried it and it didn't work)

>From Rich Kulawiec <rsk_at_itw.com>:

It doesn't look like there is -- it looks like either (a) the tape
is full or (b) it's a cheap tape and is generating write() errors.
I'd say that it's worth doing a back-of-the-envelope calculation
based on the size/density of your tape to see if (a) is possible;
if not, then it's probably (b) and you should give it a shot with
the best tapes you can buy -- in the case of 8mm, that'd be Exabyte.
I'm not sure who makes the best 4mm, but I've had good luck with
Denon data-grade tapes.

(I've tried all kinds of tapes from all kinds of manufacturers, in
various lengths. Even the tape that came with the drive doesn't work.)

>From alan_at_nabeth.cxo.dec.com:

        Generally speaking I/O errors are exactly what they claim
        to be; failure for an input/output request to complete.
        Check the system error log to see if will offer any clue
        what the I/O errors are. Before V4 use the uerf(8) command
        with the option "-o full" to get the full error log listing.
        On V4 and later use dia(8), which may also have an "-o full"
        option.

        The other errors are the result of dump trying to clean
        up from the I/O error. It tries to query the user by
        writing and reading from /dev/tty, which cron doesn't
        provide and the clean up code for that failure, could
        have bugs in it. But, it all started with the I/O errors.

(Wow. I ran uerf and found a whole slew of CAM SCSI errors. Because
this message is so long already I'm going to post a separate message
asking about them.)

Thanks to all who helped.
- Derrick
Received on Mon Mar 03 1997 - 16:32:20 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT