SUMMARY: dump problem from Geoff Mellor on 1996-04-18 (tru64-unix-managers)

From: Geoff Mellor <grm_at_star.le.ac.uk>
Date: Wed, 17 Apr 1996 14:56:41 +0100 (BST)

Sorry about the delay with this summary but it has taken some time to try
out the various suggestions and get a solution.

Thanks to all who replied and especially David Gadbois who correctly
identfied the SCSI cabling as suspect. I replaced all the SCSI cables
with uprated versions with torroidal rings and this solved the problem.

My original post:

I have an Elite 9 disk and DLT drive on an alphastation. Each time,
I attempt to perform a dump of this disk to the DLT drive, I get the
errors listed below. All other local partitions can be dumped fine
(though admittedly none are 9Gb in size) and remote 9Gb drives dump
fine. The 9Gb drive in question can be dumped without problem to a
remote exabyte drive.

I have fsck'd the drive and no errors are found. Does anyone have
any idea what's up ?

# dump 0ucf /dev/rmt0h data2
dump: Dumping from host ltaxp1
dump: Date of this level 0 dump: Wed Feb 07 19:00:10 1996 GMT
dump: Date of last level 0 dump: the start of the epoch
dump: Dumping /dev/rrz1c (/data2) to /dev/nrmt1h
dump: Mapping (Pass I) [regular files]
dump: Mapping (Pass II) [directories]
dump: Estimate: 5455759 tape blocks on 134.55 volume(s)
dump: Dumping (Pass III) [directories]
dump: Volume 1, tape # 0001, begins with blocks from i-node 2
dump: 0.01% done -- finished in 00:00
dump: Dumping (Pass IV) [regular files]
dump: Write error -- wanted to write: 10240, only wrote: -1
slave_work(): write(): I/O error
dump: Write error - /dev/nrmt1h, volume 1, 8992 feet -- cannot recover
dump: Cannot fopen /dev/tty for reading
query(): fopen(): No such device or address
dump: Cannot remove shared memory
remove_shared_memory(): shmctl(): Invalid argument
dump: SIGTERM received -- Try rewriting
dump: Unexpected signal -- cannot recover

Note:

1) I have tried various combinations of dump paramters involving
   blocking factor/size/density to no avail.
2) The dump fails at different points in the tape each time - it is
    not always 8992 feet as in the above example.
3) I have tried different tapes with no success.

                   -----------------------

Thanks to:

David Gadbois gadbois_at_cyc.com
Dave Golden golden_at_invincible.com
Charlie McCarty charlie_mccarty_at_cargill.com
Alan Rollow alan_at_nabeth.cxo.dec.com
Bob.Capps Bob.Capps_at_pscmail.ps.net
Knut Hellebo Knut.Hellebo_at_nho.hydro.com

The replies in full:

If you get traffic-related heisenbugs, it is almost certainly a
cabling and/or termination problem. Make sure you are following all
the SCSI rules. If that is not the problem, it may be a device
firmware problem -- the Elite 9s have gone through a bunch of firmware
revisions, and I know DEC requires at least some revision level (14 or
17, I think) for the drives to work with a RAID controller.

--David Gadbois

-------------------------------------------------------------------------

Check the size of your /tmp area. I have heard reports of problems
restoring files if /tmp is too small, perhaps there is also a problem
dumping with a small /tmp.

good luck,

Dave

--
Dave Golden                             golden_at_invincible.com
Invincible Technologies Corporation
-------------------------------------------------------------------------
It was my understanding that "dump" was limited to 2 Gb.  Try "vdump".
Charlie McCarty                          (612) 742 6430
Cargill Research                         (612) 742 7909  fax
-------------------------------------------------------------------------
Have you considered that the I/O error it complains about is real
I/O error?  Use the uerf(8) command to print the error log and
look for SCSI errors on the device getting the I/O error.  Be
sure to use the option "-o full" to get all the SCSI information
related to the error.  The symtom sounds strange since EOT should
be recognized by dump and with the juggling you've describe the
I/O seems awfully persistent, but the error log should offer a
clue.
Alan Rollow <alan_at_nabeth.cxo.dec.com>
----------------------------------------------------------------------------
     I had a problem of a similar nature.  I haven't summarized yet, but 
     basically I had a dump script that was failing and reporting back at 
     lease these messages that you are getting:
     
     dump: Cannot fopen /dev/tty for reading
     query(): fopen(): No such device or address dump: Cannot remove shared memory
     remove_shared_memory(): shmctl(): Invalid argument dump: SIGTERM 
     received -- Try rewriting
     dump: Unexpected signal -- cannot recover
     
     Turns out I had another cron that someone had accidentally started so 
     another dump script was running and tying up the drive.
     
     Just noticing that you call dump with the device /dev/rmt0h, yet dump 
     reports that it is using /dev/nrmt0h?  Since I only use /dev/nrmt0h 
     for my device, I don't see anything different.  I do make sure that I 
     do an "mt -f /dev/rmt0h rewind" command before starting.
     
     Hope this helps.
     
     Bob Capps
     Perot Systems
----------------------------------------------------------------------------
As far as I know there's a patch to dump (DEC Unix 3.2) available from
Digital. You might try ftp.service.digital.com (or maybe even
ftp.digital.de)
Knut Helleboe

Received on Wed Apr 17 1996 - 18:42:00 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:46 NZDT