One of my machines here is an Alphastation 500/333, which is giving some
unusual problems with a Micropolis fast/wide SCSI disk. I wonder if
anyone here might have any ideas as to a solution (other than "buy
approved Digital disks"!).
The scsi bus has these devices attached (taken from uerf for
convenience):
rz0 at scsi0 target 0 lun 0 (LID=0)
_(DEC RZ28D (C) DEC 0008) (Wide16)
rz2 at scsi0 target 2 lun 0 (LID=1)
_(MICROP 3391WS x43h) (Wide16)
tz3 at scsi0 target 3 lun 0 (LID=2)
_(ARCHIVE Python 28849-XXX 4.CM)
changer at scsi0 target 3 lun 1
_(LID=3) (ARCHIVE Python 28849-XXX 4.CM)
rz4 at scsi0 target 4 lun 0 (LID=4)
_(DEC RRD45 (C) DEC 1645)
And I am running Digital UNIX V4.0B (Rev. 564), firmware version 6.4-3.
Most relevant-seeming patches from the duv40bas00005-19970926 kit are
installed.
The tape device is external and terminated, but same behaviour seen with
it absent. The RZ28 is the system disk.
What happens is, the machine will run normally for some amount of time
(3 hours... 5 days...) then I get a kernal panic and crash. From the
error log, there are many of these events:
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 34.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Thu Nov 20 17:09:51 1997
OCCURRED ON SYSTEM mnhepw
SYSTEM ID x0005000F
SYSTYPE x00000000
----- UNIT INFORMATION -----
CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x0000
x0010 LUN x0
TARGET x2
And the crash dump says (hope I'm picking out the important part here):
Hard Error Detected
MICROP 3391WS ^X3391WS
Active CCB at time of error
Command timed out
cam_logger: CAM_ERROR packet
cam_logger: bus 0 target 2 lun 0
cdisk_complete
Retries Exhausted
Hard Error Detected
MICROP 3391WS ^X3391WS
Active CCB at time of error
Command timed out
AdvFS I/O error:
Volume: /dev/rz2g
Tag: 0xfffffff7.0000
Page: 450
Block: 7614528
Block count: 32
Type of operation: Write
Error: 5
OK, this clearly points at the Micropolis disk as the culprit, but I
don't suspect a hardware fault in the disk, as the same behaviour was
also seen with a different Micropolis disk (3243WS; a 4G disk instead of
9G). Also similar behaviour: after the crash, "show devices" at the
console doesn't list the Micropolis disk, which doesn't return until a
power-cycle.
Dropping in a (non-wide) Seagate disk instead of a Micropolis,
everything works ok.
I think that probably tells the whole story - but is there likely to be
any solution, or is this just a bad hardware mismatch?
Thanks very much for any insight/hints/answers,
Graham Allan
University of Minnesota
Received on Fri Nov 21 1997 - 01:41:52 NZDT