Errors on DAT TLZ06 and dd

From: Lucio Chiappetti <lucio_at_ifctr.mi.cnr.it>
Date: Mon, 22 Dec 1997 09:19:51 +0100 (MET)

Last week I had problems with a DAT TLZ06 drive attached on the 2nd SCSI bus
of our Alpha 200/100 (DU 3.2). I was getting lots of errors (I tried with at
least 5 different tapes, which could be read on another unit attached to Sun)
and called DEC maintenance. They came and replaced the drive last Friday.

Friday evening and Sunday afternoon I tried reading one of my tapes, and I'm
still getting errors sporadically and in peculiar conditions.

The tape is a sequence of some 900 blocked files, each one may have a
different blocksize. The tape is read by a sequence of "dd" commands like
e.g. the following :

dd if=/dev/nrmt0h of=pd.instdir ibs=04752 cbs=0132 conv=unblock
dd if=/dev/nrmt0h of=ipd001.obsdir ibs=00640 cbs=0040 conv=unblock
dd if=/dev/nrmt0h of=ipd001.pdhkd000 ibs=30836
dd if=/dev/nrmt0h of=ipd001.pdeng000 ibs=31992
dd if=/dev/nrmt0h of=ipd001.p1cal000 ibs=02176
etc.

The procedure which constructs the sequence has been in use for a long time
and has been safely tested.

The tape has been FULLY analysed with /usr/field/tapex -w -m and gave no
errors whatsoever (no hard errors, and also no wrong records, missing tape
marks or other problems sometimes found on other tapes of same origin).

Also if I take my file of "dd" commands and execute such commands by hand
one by one, they work successfully.
If instead I execute a "longish" sequence of "dd" commands there is a
probability that after a while one of them fails with an i/o error.
If I then clear everything and re-issue manually the offending command alone,
it will work successfully.

At the moment I'm reading my tape by "classes of files", in this way I have
sequences of a small number of "dd" commands, followed by a "mt fsf" skip of a
few files, followed by some more "dd" and so on. This way I have been
successfull in reading the tape, but in the past the full procedure used to
work.

An uerf shows errors of either forms

OS EVENT TYPE 199. CAM SCSI
ERROR TYPE Hard Error Detected
---- CAM STRING -----
                                        MEDIUM ERROR - Nonrecoverable medium
                                         _error

or

OS EVENT TYPE 199. CAM SCSI
ERROR TYPE Hard Error Detected
----- CAM STRING -----
                                        HARDWARE ERROR - Nonrecoverable
                                         _hardware error

Is there anything which may cause a sequence of dd commands which are too
close to fail (some timeout ?). Should I suspect they brought me a bad tape
drive, or could it be a SCSI bus problem ?

----------------------------------------------------------------------------
Lucio Chiappetti - IFCTR/CNR - via Bassini 15 - I-20133 Milano (Italy)
----------------------------------------------------------------------------
Fuscim donca de Miragn E tornem a sta scio' in Bregn
Che i fachign e i cortesagn Magl' insema no stagn begn
Drizza la', compa' Tapogn (Rabisch, II 41, 96-99)
----------------------------------------------------------------------------
For more info : http://www.ifctr.mi.cnr.it/~lucio/personal.html
----------------------------------------------------------------------------
Received on Mon Dec 22 1997 - 09:20:25 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT