I have been researching this via altavista and dejanews and it seems
like others are having trouble with drives dropping offline on 2100s,
but in general it seems to be RZ28/29 on RAID controllers related to
firmware problems.
I have a 2100 4/275 that had:
_(DEC RZ28 (C) DEC 442C)
_(SEAGATE ST15230N 0168)
_(SEAGATE ST15230N 0638)
the 0168 Hawk was always dropping offline, requiring a powercycle to get
it functional again (pulling from the StorageWorks rack and plugging back
in). The 0638 drive seems never to have dropped offline. I had the 0168
drive firmware upgraded, but in the interim replaced it with a
_(FUJITSU M2954S-512 0142) (7200RPM 4GB)
by connecting to the external SCSI connector.
Now, this FUJI drive is dropping offline and sometimes requiring a drive
case power-cycle to come back online (though it usually just requires a:
scu -f /dev/rrz3c reset device
to bring it back from the dead.)
We also have another 2100 system that has a CONNER 4GB (4107) drive that is
continually going offline (with data corruption when it does)
Since the problem seems so widespread (not related to just a specific disk
type or machine) i'm wondering if it might be a DUNIX or SCSI controller
problem. (of course i could just have bad luck with bad disks, but i don't
think so)
Is anyone aware of any known problems and resolutions that might pertain
to the problems i'm seeing? If it's overly optimistic characteristics
of the SCSI subsystem is there something i can do in the DDR database to
make it more lenient?
Here's a more in-depth uerf listing:
------------------------------------------------------------------------------
OPERATING SYSTEM DEC OSF/1
SYSTEM ID x00060009 CPU TYPE: DEC 2100
Digital UNIX V4.0A (Rev. 464); Thu
_Jan 9 09:08:40 MST 1997
physical memory = 512.00 megabytes.
Firmware revision: 4.6
PALcode: OSF version 1.45
AlphaServer 2100 4/275
cpu 0 EV-45 4mb b-cache
cpu 1 EV-45 4mb b-cache
cpu 2 EV-45 4mb b-cache
cpu 3 EV-45 4mb b-cache
psiop0 at pci0 slot 1
Loading SIOP: script 1000e00, reg
_81000000, data 405a0de8
scsi0 at psiop0 slot 0
rz0 at scsi0 target 0 lun 0 (LID=0)
_(DEC RZ28 (C) DEC 442C)
rz1 at scsi0 target 1 lun 0 (LID=1)
_(SEAGATE ST15230N 0638)
rz2 at scsi0 target 2 lun 0 (LID=2)
_(SEAGATE ST15230N 0638)
rz3 at scsi0 target 3 lun 0 (LID=3)
_(FUJITSU M2954S-512 0142)
rz5 at scsi0 target 5 lun 0 (LID=4)
_(DEC RRD43 (C) DEC 1084)
rz6 at scsi0 target 6 lun 0 (LID=5)
_(DEC RRD43 (C) DEC 1084)
------------------------------------------------------------------------------
When the Fuji went offline, the following UERF message occured on a
scu show edt lun 0
------------------------------------------------------------------------------
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
CLASS x0022 DEC SIM
SUBSYSTEM x0000 DISK
BUS # x0000
x0018 LUN x0
TARGET x3
ROUTINE NAME as_finish
Autosense failed
CAM ENTRY x0000040E SIM_WS
ERROR TYPE Soft Error Detected (recovered)
------------------------------------------------------------------------------
This error message occured prior to that:
------------------------------------------------------------------------------
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x0000
x0018 LUN x0
TARGET x3
ROUTINE NAME cdisk_complete
Cmd Timeout - retrying
ERROR TYPE Soft Error Detected (recovered)
DEVICE NAME FUJITSU M2954S-512 .M2954S-512
Active CCB at time of error
Command timed out
ERROR - os_std, os_type = 11, std_type = 10
----- ENT_CCB_SCSIIO -----
*MY ADDR x1FE2B580
CCB LENGTH x00C0
FUNC CODE x01
CAM_STATUS x000B CAM_CMD_TIMEOUT
PATH ID 0.
TARGET ID 3.
TARGET LUN 0.
CAM FLAGS x00000482
CAM_QUEUE_ENABLE
CAM_DIR_OUT
CAM_SIM_QFRZDIS
*PDRV_PTR x1FE2B228
*NEXT_CCB x00000000
*REQ_MAP x062CA400
VOID (*CAM_CBFCNP)() x004811B0
*DATA_PTR xA07F4000
DXFER_LEN x00010000
*SENSE_PTR x1FE2B250
SENSE_LEN x40
CDB_LEN x06
SGLIST_CNT x0000
CAM_SCSI_STATUS x0000 SCSI_STAT_GOOD
SENSE_RESID x00
RESID x00010000
CAM_CDB_IO x000000000000008090C9010A
CAM_TIMEOUT x0000003C
MSGB_LEN x0000
VU_FLAGS x4000
TAG_ACTION x20
------------------------------------------------------------------------------
--stephen
--
Stephen Dowdy - Systems Administrator - CS Dept - Univ of Colorado, Boulder
dowdy_at_cs.colorado.edu - 303-492-6196 - http://www.cs.colorado.edu/~dowdy/
"Team Spam Forever" (A division of Beatrice) { NO cold Sales Calls !!! }
Received on Thu Jan 30 1997 - 19:26:36 NZDT