On 22 Dec 1997 I wrote about problems with my DAT drive :
I read tapes which are a sequence of some 900 blocked files, each one may have
a different blocksize. The tape is read by a sequence of "dd" commands like
e.g. the following :
dd if=/dev/nrmt0h of=pd.instdir ibs=04752 cbs=0132 conv=unblock
dd if=/dev/nrmt0h of=ipd001.obsdir ibs=00640 cbs=0040 conv=unblock
dd if=/dev/nrmt0h of=ipd001.pdhkd000 ibs=30836
dd if=/dev/nrmt0h of=ipd001.pdeng000 ibs=31992
dd if=/dev/nrmt0h of=ipd001.p1cal000 ibs=02176
etc.
If I execute a "longish" sequence of such "dd" commands there is a
probability that after a while one of them fails with an i/o error.
There are no errors if I position the tape manually and execute a single
dd command by hand. There are also no errors if I have bunches of dd commands
interleaved by mt fsf commands.
Is there anything which may cause a sequence of dd commands which are too
close to fail (some timeout ?).
------------
I have now some further elements (I received three replies and I'm partially
responding to them as well). First some statements :
- The procedure which constructs the sequence of dd has been in use for
more than 2 years without problems
- The particular tapes were produced at the same site during an interval
of 4 months. The site distributes data tapes worldwide and reports no
media problems in the interval.
- Moreover tapes can be FULLY analysed with /usr/field/tapex -w -m without
errors of any sort
Then some description of what DEC did.
- on Dec 18 they replaced our old TLZ06 drive, which we had since some
4-5 years. That was BEFORE the problem reported here. The drive had
a very hard fault.
But it did never give the problem reported here before failing.
- The old TLZ06 had firmware revision 491A.
The new one replaced on Dec 18 had firmware revision 4BQE.
- On Dec 22 DEC came again, they replaced the drive with a third one
(firmware revision 4BQE again).
They also replaced the KPZAA-AA scsi controller to which it was
attached.
- they left me to do some tests over Xmas.
I tested reading several times some of my tapes :
(a) on the original machine (Alpha 200) with the DAT drive on the
second SCSI bus where it should be, but with no other devices
on the bus (the bus had normally a half-inch tape and a CD-ROM,
no disks, which are on the first bus)
here I got systematically errors.
(b) on a different machine (Alpha 255) with the DAT drive on the
only bus together with disks.
here I got no errors on the same tapes
(c) I also received a new tape, and read that on the Alpha 255, and
here I got an error (same sort, long dd sequence failed, shorter
ones OK, but did no further tests.
Now I specify my questions in the same form I suggested to DEC assistance.
(A) : can the firmware revision number have something to do with it ?
(B) : is somebody able to decode the long uerf errors I am enclosing ?
Two respondents suggested that, but I do not have the relevant SCSI
documetnation.
I enclose ONE report of the problem on the Alpha 200 and one of the
problem on the ALpha 255 (this looks somewhat different, could it be
some sort of interference with the disks as one of the respondents
said to have on his system ?)
In both cases I give only the first records after the occurrence in
reverse order (uerf -R)
TIA
----------------------------------------------------------------------------
Lucio Chiappetti - IFCTR/CNR - via Bassini 15 - I-20133 Milano (Italy)
----------------------------------------------------------------------------
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
**typical case on Alpha 200 **
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
********************************* ENTRY 890.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 112.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Tue Dec 23 14:50:49 1997
OCCURRED ON SYSTEM kronos
SYSTEM ID x0002000D CPU TYPE: DEC 7000
SYSTYPE x00000000
----- UNIT INFORMATION -----
CLASS x0001 TAPE
SUBSYSTEM x0000 DISK
BUS # x0001
x0060 LUN x0
TARGET x4
----- CAM STRING -----
ROUTINE NAME ctape_iodone
----- CAM STRING -----
ERROR TYPE Hard Error Detected
----- CAM STRING -----
DEVICE NAME DEC TLZ06
----- CAM STRING -----
Active CCB at time of error
----- CAM STRING -----
CCB request completed with an error
ERROR - os_std, os_type = 11, std_type = 10
----- ENT_CCB_SCSIIO -----
*MY ADDR x05FBDB28
CCB LENGTH x00C0
FUNC CODE x01
CAM_STATUS x0084 CAM_REQ_CMP_ERR
AUTOSNS_VALID
PATH ID 1.
TARGET ID 4.
TARGET LUN 0.
CAM FLAGS x00000040
CAM_DIR_IN
*PDRV_PTR x05FBD828
*NEXT_CCB x00000000
*REQ_MAP x05FAA300
VOID (*CAM_CBFCNP)() x00520740
*DATA_PTR x0003C008
DXFER_LEN x00007FE0
*SENSE_PTR x05FBD850
SENSE_LEN x40
CDB_LEN x06
SGLIST_CNT x0000
CAM_SCSI_STATUS x0002 SCSI_STAT_CHECK_CONDITION
SENSE_RESID x2E
RESID x00007FE0
CAM_CDB_IO x00000000000000E07F000008
CAM_TIMEOUT x0000012F
MSGB_LEN x0000
VU_FLAGS x0000
TAG_ACTION x00
----- CAM STRING -----
Error, exception, or abnormal
_condition
----- CAM STRING -----
HARDWARE ERROR - Nonrecoverable
_hardware error
----- ENT_SENSE_DATA -----
ERROR CODE x0070 CODE x70
SEGMENT x00
SENSE KEY x0004 HARDWARE ERR
INFO BYTE 3 x00
INFO BYTE 2 x00
INFO BYTE 1 x00
INFO BYTE 0 x00
ADDITION LEN x0A
CMD SPECIFIC 3 x00
CMD SPECIFIC 2 x00
CMD SPECIFIC 1 x00
CMD SPECIFIC 0 x00
ASC x44
ASQ x80
FRU x00
SENSE SPECIFIC x000060
ADDITIONAL SENSE
0000: 00000000 00000000 00000000 00000000 *................*
0010: 00000000 00000000 00000000 00000000 *................*
0020: 00000000 00000000 00000000 00000000 *................*
0030: 7E250000 00005E3C 00000000 00000000 *..%~<^..........*
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
** last case on Alpha 255 **
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
********************************* ENTRY 259.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 4.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Fri Jan 2 11:08:34 1998
OCCURRED ON SYSTEM poseidon
SYSTEM ID x0006000D CPU TYPE: DEC 7000
SYSTYPE x00000000
----- UNIT INFORMATION -----
CLASS x0001 TAPE
SUBSYSTEM x0000 DISK
BUS # x0000
x0028 LUN x0
TARGET x5
----- CAM STRING -----
ROUTINE NAME ctape_iodone
----- CAM STRING -----
ERROR TYPE Hard Error Detected
----- CAM STRING -----
DEVICE NAME DEC TLZ06
----- CAM STRING -----
Active CCB at time of error
----- CAM STRING -----
CCB request completed with an error
ERROR - os_std, os_type = 11, std_type = 10
----- ENT_CCB_SCSIIO -----
*MY ADDR x02655B28
CCB LENGTH x00C0
FUNC CODE x01
CAM_STATUS x0084 CAM_REQ_CMP_ERR
AUTOSNS_VALID
PATH ID 0.
TARGET ID 5.
TARGET LUN 0.
CAM FLAGS x00000040
CAM_DIR_IN
*PDRV_PTR x02655828
*NEXT_CCB x00000000
*REQ_MAP x09F6A300
VOID (*CAM_CBFCNP)() x00526150
*DATA_PTR x00028008
DXFER_LEN x00007C9C
*SENSE_PTR x02655850
SENSE_LEN x40
CDB_LEN x06
SGLIST_CNT x0000
CAM_SCSI_STATUS x0002 SCSI_STAT_CHECK_CONDITION
SENSE_RESID x2E
RESID x00007C9C
CAM_CDB_IO x000000000000009C7C000008
CAM_TIMEOUT x0000012F
MSGB_LEN x0000
VU_FLAGS x0000
TAG_ACTION x00
----- CAM STRING -----
Error, exception, or abnormal
_condition
----- CAM STRING -----
HARDWARE ERROR - Nonrecoverable
_hardware error
----- ENT_SENSE_DATA -----
ERROR CODE x0070 CODE x70
SEGMENT x00
SENSE KEY x0004 HARDWARE ERR
INFO BYTE 3 x00
INFO BYTE 2 x00
INFO BYTE 1 x00
INFO BYTE 0 x00
ADDITION LEN x0A
CMD SPECIFIC 3 x00
CMD SPECIFIC 2 x00
CMD SPECIFIC 1 x00
CMD SPECIFIC 0 x00
ASC x44
ASQ x00
FRU x00
SENSE SPECIFIC x000011
ADDITIONAL SENSE
0000: 00000000 00000000 00000000 00000000 *................*
0010: 00000000 00000000 00000000 00000000 *................*
0020: 00000000 00000000 00000000 00000000 *................*
0030: 7E250000 00005E3C 00000000 00000000 *..%~<^..........*
********************************* ENTRY 260.
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 3.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Fri Jan 2 11:08:34 1998
OCCURRED ON SYSTEM poseidon
SYSTEM ID x0006000D CPU TYPE: DEC 7000
SYSTYPE x00000000
----- UNIT INFORMATION -----
CLASS x0001 TAPE
SUBSYSTEM x0000 DISK
BUS # x0000
x0028 LUN x0
TARGET x5
----- CAM STRING -----
ROUTINE NAME ctape_iodone
----- CAM STRING -----
Unexpected CCB status
----- CAM STRING -----
ERROR TYPE Hard Error Detected
----- CAM STRING -----
DEVICE NAME DEC TLZ06
----- CAM STRING -----
Active CCB at time of error
----- CAM STRING -----
BUS free
ERROR - os_std, os_type = 11, std_type = 10
----- ENT_CCB_SCSIIO -----
*MY ADDR x09F88728
CCB LENGTH x00C0
FUNC CODE x01
CAM_STATUS x0053 CAM_UNEXP_BUSFREE
SIM QFRZN
PATH ID 0.
TARGET ID 5.
TARGET LUN 0.
CAM FLAGS x00000040
CAM_DIR_IN
*PDRV_PTR x09F88428
*NEXT_CCB x00000000
*REQ_MAP x09F6A300
VOID (*CAM_CBFCNP)() x00526150
*DATA_PTR x00032008
DXFER_LEN x00007C9C
*SENSE_PTR x09F88450
SENSE_LEN x40
CDB_LEN x06
SGLIST_CNT x0000
CAM_SCSI_STATUS x0000 SCSI_STAT_GOOD
SENSE_RESID x00
RESID x00007C9C
CAM_CDB_IO x000000000000009C7C000008
CAM_TIMEOUT x0000012F
MSGB_LEN x0000
VU_FLAGS x0000
TAG_ACTION x00
********************************* ENTRY 261.
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 2.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Fri Jan 2 11:08:34 1998
OCCURRED ON SYSTEM poseidon
SYSTEM ID x0006000D CPU TYPE: DEC 7000
SYSTYPE x00000000
----- UNIT INFORMATION -----
CLASS x0022 DEC SIM
SUBSYSTEM x0000 DISK
BUS # x0000
x0028 LUN x0
TARGET x5
----- CAM STRING -----
ROUTINE NAME sm_unexpected
----- CAM STRING -----
Unexpected bus free
----- UNSUPPORTED ENTRY -----
CAM ENTRY x0000040E SIM_WS
********************************* ENTRY 262.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 1.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Fri Jan 2 11:05:31 1998
OCCURRED ON SYSTEM poseidon
SYSTEM ID x0006000D CPU TYPE: DEC 7000
SYSTYPE x00000000
----- UNIT INFORMATION -----
CLASS x0022 DEC SIM
SUBSYSTEM x0000 DISK
BUS # x0000
x0028 LUN x0
TARGET x5
----- CAM STRING -----
ROUTINE NAME ss_abort_done
----- CAM STRING -----
SCSI abort has been performed
Received on Thu Jan 08 1998 - 18:48:21 NZDT