This is a follow-up to my previous mail when ...
On Thu, 13 Nov 1997, Lucio Chiappetti wrote:
> We have an Alpha 200/100 on which we have attached some disks recycled from
> our previous Ultrix systems (four RZ58 in two cabinets). The CPU is under
> maintenance contract, but the old disks are not
[because of excessive cost of maintenance fees for such disk model]
[message occurring during reboot]
> cam_logger CAM_ERROR packet bus 0 target 2 lun 0
> ss_perform_timeout
> (repeated 3-4 times)
> Reached max abort count, scheduled bus reset
I left the BA42 cabinet with the two disks rz2 and rz3 (both RZ58) off for the
weekend since then. This morning I checked uerf -R and found a burst of "event
type 199 CAM SCSI" errors on the day when the problem occurred, and a couple
the day the before. All of them were related to the 'rz2' disk.
This morning I tried to reboot the machine, I verified again that :
- a "show device" at ROM level lists the disks
- the boot sequence lists the disks
Then I booted single user, and did a /sbin/bcheckrc. This time I obtained
errors ALSO ON THE OTHER DISK IN THE SAME CABINET.
CAM error packet bus 0 target 3
cdisk_check_sense
Medium error - bad block 1373633
Hard error detected
DEC RZ58
Active CCB at time of error (what does this mean ?)
Medium error not recovered
...
rrz3c cannot read blk .... run fsck manually
Followed by the usual sequence of errors on the other disk rz2.
I tried a full fsck on disk rz3, and it came out full of other errors. I ran
it in -y mode to answer yes to all "repair" questions.
This did not help. At the next boot disk rz3 gave again the same sort of
errors. And another fsck -y had no effect.
We checked very carefully all cablings inside and outside the cabinet,
replaced the external SCSI cable with a different one, removed one disk
at a time from the BA42 cabinet, tried them with different SCSI addresses,
even tried with a spare BA42.
Nothing helped. We are pretty sure to exclude a problem in the SCSI controller
on the CPU or on the bus (we had originally a chain rz0 (internal) --> BA42
with rz2+rz3 --> BA42 with rz1+rz5) and also on the cabinet (which on the
other hand is so simple, just a power supply and a bunch of cables).
It is extremely curious that two disks inside the same cabinet failed more or
less at the same time !!!
However the symptoms are different. When we opened the cabinet we found one
disk was warm, and the other one cool. We found out that "rz2" (the one which
gives repeated time out during boot) is the cool one, probably it does not
even spin up to operational speed. The warm one is "rz3", the one which
gives bad sectors, but is visible to the system.
How warm shall an RZ58 disk be during normal operation ?
We also did further attempts :
- run newfs on rz3. No errors during newfs, but a long sequel of
sector errors in subsequent fsck.
Does fsck do a full formatting ?
- remove the controllers (or at least what we think are the controllers,
the electronics card underneath the RZ58) and swap them (we thought
rz2 had a motor problem and a good controller, and rz3 perhaps a bad
controller)
- rz3 with the new controller still gives sector errors, and a newfs
behaves as above.
- we finally tried to run "scu" and issue a "format" command to such
rz3 (in the case newfs does not do a full formatting). We were
unsure of what to say, we tried "format defects all", "format",
"format defects primary" and "format defects none". In all cases
it goes one for quite a while and terminates with
format unit failed EIO (5) i/o error
sense jey 0x3 MEDIUM ERROR non recovered
sense code/qualif 0x32,0 no defect spare location
- a test selftest or test memory from scu is OK,
a test drive or test controller is unsupported (??), gives a
SCSI SEND_DIAGNOSTIC failed
EIO (5) i/o error
sense key 0x5 illegal request
illegal request or CDB parameter
sense code qualifier 0x24
Should we call it a day, and consider BOTH disks irrecoverable (and buy newer
ones at the price we did not pay for maintenance) ?
Or is there anything else we can do ?
----------------------------------------------------------------------------
Lucio Chiappetti - IFCTR/CNR - via Bassini 15 - I-20133 Milano (Italy)
----------------------------------------------------------------------------
Fuscim donca de Miragn E tornem a sta scio' in Bregn
Che i fachign e i cortesagn Magl' insema no stagn begn
Drizza la', compa' Tapogn (Rabisch, II 41, 96-99)
----------------------------------------------------------------------------
For more info :
http://www.ifctr.mi.cnr.it/~lucio/personal.html
----------------------------------------------------------------------------
Received on Mon Nov 17 1997 - 18:08:21 NZDT