SUMMARY: Determining if DAT has I/O error and/or SCSI cable lenght problem? from Richard Bemrose on 1999-05-07 (tru64-unix-managers)

From: Richard Bemrose <rb237_at_phy.cam.ac.uk>
Date: Fri, 07 May 1999 12:21:56 +0100 (BST)

Hello fellow admins,

I must first thank the following people for their quick and informative replies:
      "Dr. Alan Rollow" <alan_at_nabeth.cxo.dec.com>
      "Joe Fletcher" <joe_at_meng.ucl.ac.uk>
      "William H. Magill" <magill_at_isc.upenn.edu>
      Jim Fitzmaurice" <jpfitz_at_fnal.gov>
      "Dan Kirkpatrick" <dkirk_at_suhep.phy.syr.edu>
      "Dr. Tom Blinn, 603-884-0646" <tpb_at_doctor.zk3.dec.com>

In my original poster I enquired what DAT/SCSI/tape problem generates the
following errors (on our AlphaStation 1000 DU 4.0D patch kit #3):
      Hard Error Detected
      DEC TLZ07 (C)DEC553C
      Active CCB at time of error
      CCB request completed with an error
      Error, exception, or abnormal condition
      MEDIUM ERROR - Nonrecoverable medium error
      cam_logger: CAM_ERROR packet
      cam_logger: bus 0 target 5 lun 0
      ctape_iodone

The general opinion was that is is most likely to be a problem with quality of
the data on the DAT itself and not due to the length of the internal SCSI
cable. This data was written on a DAT unit outside our institution and
therefore can not be verified. Thanks to doctors Alan Rollow and Tom Blinn for
providing detailed information regarding the analysis of the above error (see
below for details). Within the optional "System Exercisers" subset is a utility
called "tapex" that can be used to do a variety of tape integrity tests:
     # /usr/field/tapex -E -v
Our replacement DAT unit passed all test. Nevertheless, we was able to extract
the data using another DAT unit. (Thanks to Joe Fletcher who offered to extract
the DAT tape).

Finally, William H. Magill and Dan Kirkpatrick commented about SCSI cable length
(although which does not seem to be the case here).

"William H. Magill" <magill_at_isc.upenn.edu> wrote:
I don't know the differences between the 1000 and 1000A - however, the 1000A
has Fast/Wide SCSI at the connector on the rear. The internal buss on the
1000A is fairly long. If you use 2 meter cable to connect a Storage Works
Shelf... drives past slot 4 (I think) are now too far away and won't work
correctly. Drives up to slot4 work fine. Replace the 2 meter cable with a 1.5
meter cable and everything is fine.

--------------------------------------------------------------------
"Dr. Alan Rollow" write <alan_at_nabeth.cxo.dec.com>:

SCSI errors typically come in two flavors; device errors or protocol
errors. When the error log entry contains Request Sense Data from
the device you can be pretty sure it was a device error instead of
a protocol error (*). In your case, the media error is pretty
clearly a device error, but you should format the error log to
get the detail. Prefer to use DECevent, but you can use uerf(8).
If you use uerf(8), use the option "-o full" to get the full
entry listing.

Media errors are probably due to the media. But you could have
a tape written so poorly that any drive reading it will claim
it is a media error. Of course, if no drive can read it, then
it as well be a bad tape. The sense data may offer a bit more
information than "media error", but you may need the programming
documentation for the drive to translate the ASC/ASCQ codes
that are in the sense data. Many of the values have standard
meaning and uerf/DECevent can translate those, but many values
are vendor specific and you need the vendor's documentation to
translate the value.

One way to double check for media errors is to try reading the
tape on different drives. If none can read it, the media is
probably at fault. You may also want to read other data written
by that drive. If only the one piece of media is bad, it seems
pretty clear where the fault was. If all the media written
by that drive was bad, then it was more likely a drive problem.

(*) Not always, but most of the time. It could be the device
detected a protocol error instead of the host... The content
of the sense data will offer a clue.

--------------------------------------------------------------------
"Dr. Tom Blinn, 603-884-0646" <tpb_at_doctor.zk3.dec.com> wrote:

In the optional "System Exercisers" subset is a utility called "tapex" that
can be used to do a variety of tape integrity tests. For instance, you can
use it to do a read-only scan of a tape, reporting the maximum record and
block sizes and the number of files. Or you can use it to perform a test of
the tape media where it writes a known data pattern end to end, then rewinds
and reads back the media.

A "MEDIUM ERROR - Nonrecoverable medium error" is almost always returned by
the tape drive itself, and is NOT a "out of spec" cable length problem; the
cable length problem usually is reported by the CAM subsystem itself, not by
a particular device, although it can be observed while trying to access some
specific device. But the error you are seeing is because the tape drive can
not read the media.

--------------------------------------------------------------------

Regards,
Rich

/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\
/_/ Richard A Bemrose /_\ Polymers and Colloids Group \_\
/_/ email: rb237_at_phy.cam.ac.uk /_\ Cavendish Laboratory \_\
/_/ Tel: +44 (0)1223 337 267 /_\ University of Cambridge \_\
/_/ Fax: +44 (0)1223 337 000 /_\ Madingley Road \_\
/_/ Mobile: +44 (0)410 168 873 / \ Cambridge, CB3 0HE, UK \_\
/_/_/_/_/_/_/ http://www.poco.phy.cam.ac.uk/~rb237 \_\_\_\_\_\_\_\
             "Life is everything and nothing all at once"
              -- Billy Corgan, Smashing Pumpkins
Received on Fri May 07 1999 - 11:24:37 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT