Hi,
We got an Alphaserver 2100 4/275 with KZPSC RAID controller. DU version
is 3.2F (Rev 69.73)
A 5-RZ28-disks RAID 5 level is accesed via the KZPSC controller
externally. This array contains 2 AdvFS filesets and the database
(Oracle) accesed through a raw device.
This installation has been working properly with OSF V.3.0B by around 2
years. Four or five months ago, it began crashing randomly (crashes
related to AdvFS). The first decision was to migrate to DU 3.2F.
Two weeks ago, 3.2F crashed with problems related also to AdvFS. A
segment of "messages" is below:
Oct 12 00:10:26 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3104
Oct 12 00:10:27 alpha21 vmunix: vd 1 blk 745840 blkCnt 128
Oct 12 00:10:27 alpha21 vmunix: write error = 5
Oct 12 00:10:28 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3552
Oct 12 00:10:28 alpha21 vmunix: vd 1 blk 1131616 blkCnt 128
Oct 12 00:10:28 alpha21 vmunix: read error = 5
Oct 12 00:10:28 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2967
Oct 12 00:10:29 alpha21 vmunix: vd 1 blk 2536128 blkCnt 128
Oct 12 00:10:29 alpha21 vmunix: write error = 5
Oct 12 00:10:29 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.fffffffe.0000 tag 0xfffffff7.0000u page 337
Oct 12 00:10:29 alpha21 vmunix: vd 1 blk 6496 blkCnt 80
Oct 12 00:10:29 alpha21 vmunix: write error = 5
Oct 12 00:10:29 alpha21 vmunix:
Oct 12 00:10:29 alpha21 vmunix: bs_osf_complete: metadata write failed
Oct 12 00:10:30 alpha21 vmunix: AdvFS Domain Panic; Domain local_domain
Id 0x2df0fa52.000d0bc0
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2983
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 2536384 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2975
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 2536256 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2959
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 2536000 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3112
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 745968 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3544
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 1131488 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: read error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2992
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 681424 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: read error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3120
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 746096 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3128
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 746224 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3136
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 746352 blkCnt 128
Oct 12 00:10:31 alpha21 vmunix: write error = 5
Oct 12 00:10:31 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.fffffffe.0000 tag 0xfffffff7.0000u page 129
Oct 12 00:10:31 alpha21 vmunix: vd 1 blk 2432 blkCnt 16
Oct 12 00:10:31 alpha21 vmunix: write error = 5
Oct 12 00:10:31 alpha21 vmunix:
Oct 12 00:10:31 alpha21 vmunix: bs_osf_complete: metadata write failed
Oct 12 00:10:31 alpha21 vmunix: AdvFS Domain Panic; Domain home_domain
Id 0x2df0fa3f.0003d000
Oct 12 00:13:07 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000001.8001u page 426
Oct 12 00:13:07 alpha21 vmunix: vd 1 blk 1595936 blkCnt 96
Oct 12 00:13:07 alpha21 vmunix: read error = 5
Oct 12 00:13:07 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000006.8001u page 0
Oct 12 00:13:08 alpha21 vmunix: vd 1 blk 8720 blkCnt 16
Oct 12 00:13:08 alpha21 vmunix: read error = 5
Additionally, the "uerf" reports the following problems at the time of
the crash:
uerf version 4.2-011
(122)
********************************* ENTRY 1.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 300. SYSTEM STARTUP
SEQUENCE NUMBER 0.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Oct 23 14:37:07 1996
OCCURRED ON SYSTEM alpha21
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
MESSAGE PCXAL keyboard, language English
_(American)
Alpha boot: available memory
from
_0x11dc000 to 0x1fffe000
Digital UNIX V3.2F (Rev. 69.73);
Thu
_Oct 10 19:18:19 GMT-0500 1996
physical memory = 512.00
megabytes.
available memory = 494.23
megabytes.
using 1958 buffers containing
15.29
_megabytes of memory
Master cpu at slot 0.
Firmware revision: 4.5
PALcode: OSF version 1.45
ibus0 at nexus
AlphaServer 2100 4/275
cpu 0 EV-45 4mb b-cache
cpu 1 EV-45 4mb b-cache
gpc0 at ibus0
pci0 at ibus0 slot 0
tu0: DECchip 21040-AA: Revision:
2.3
tu0 at pci0 slot 0
tu0: DEC TULIP Ethernet
Interface,
_hardware address:
08-00-2B-E2-6A-42
tu0: console mode: selecting UTP
_(10BaseT) port: no link
psiop0 at pci0 slot 1
Loading SIOP: script 1001f00,
reg
_81222000, data 100de20
scsi0 at psiop0 slot 0
rz0 at scsi0 bus 0 target 0 lun
0 (DEC
_ RZ28 (C) DEC 442D)
rz3 at scsi0 bus 0 target 3 lun
0 (DEC
_ RZ28 (C) DEC D41C)
rz6 at scsi0 bus 0 target 6 lun
0 (DEC
_ RRD43 (C) DEC 1084)
tz5 at scsi0 bus 0 target 5 lun
0 (DEC
_ TLZ6 (C)DEC 0491)
eisa0 at pci0
ace0 at eisa0
ace1 at eisa0
lp0 at eisa0
fdi0 at eisa0
fd0 at fdi0 unit 0
dns0 at eisa0
dns0: Digital WAN Device Driver
_Interface
dns1: Digital WAN Device Driver
_Interface
dns1 at eisa0
dns2: Digital WAN Device Driver
_Interface
dns3: Digital WAN Device Driver
_Interface
vga0 at eisa0
1024x768 (QVision )
fta0 DEC CRE DEFEA FDDI Module,
_Hardware Revision 2
fta0 at eisa0
fta0: DMA Available.
fta0: DEC CRE DEFEA (PDQ) FDDI
_Interface, Hardware address:
_08-00-2B-B7-27-FE
fta0: Firmware rev: 2.46
Initializing xcr0. Please wait.
Initializing xcr0. Please wait.
Initializing xcr0. Please wait.
Initializing xcr0. Please wait.
Initializing xcr0. Please wait.
xcr0 at eisa0
re0 at xcr0 unit 0 (unit status
=
_ONLINE, raid level = 5)
pza0 at pci0 slot 7
pza0 firmware version: DEC P01
A10
_
scsi1 at pza0 slot 0
pza1 at pci0 slot 8
pza1 firmware version: DEC P01
A10
_
scsi2 at pza1 slot 0
lvm0: configured.
lvm1: configured.
dli: configured
SuperLAT. Copyright 1993
Meridian
_Technology Corp. All rights
_reserved.
x25_access: configured
wandd_base: configured
wandd_lapb: configured
wan_utilities: configured
ctf_base: configured
Node ID is 08-00-2b-b7-27-fe
(from
_device fta0)
dna_netman: configured
dna_dli: configured
********************************* ENTRY 2.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 198. ASTRO CONTROLLER
SEQUENCE NUMBER 3.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Oct 23 14:25:53 1996
OCCURRED ON SYSTEM alpha21
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
----- UNIT INFORMATION -----
CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x0000
----- CAM STRING -----
ROUTINE NAME xcr_e_restart
----- CAM STRING -----
Can't restart Controller
----- CAM STRING -----
ERROR TYPE Hard Error Detected
********************************* ENTRY 3.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 198. ASTRO CONTROLLER
SEQUENCE NUMBER 2.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Oct 23 14:25:47 1996
OCCURRED ON SYSTEM alpha21
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
----- UNIT INFORMATION -----
CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x0000
----- CAM STRING -----
ROUTINE NAME xcr_cmd_timeout
----- CAM STRING -----
Controller has stopped
responding
----- CAM STRING -----
ERROR TYPE Hard Error Detected
----- CAM STRING -----
Controller Softc at time of
error
----- ENT_XCR_SOFTC -----
*SC_BUS_NAME xFFFFFC00006A20E0
SC_CNTRL_NUM x0000000000000000
SC_CNTRL_TYPE x006A2AC000000000
*SC_CTRL xFFFFFC00006A2AC0
SC_IOHANDLE x000003A000008000
SC_FLAGS x00000002
SC_REG_OFF x00000C90
SC_MAX_ACT x0000003C
SC_SPEC_ACT x00000004
SC_CMDS_ACT x00000003
*SC_ACT_FLINK xFFFFFC001FE556B8
*SC_ACT_BLINK xFFFFFC001FE55A50
SC_CMDS_PENDING x00000000
*SC_PEND_FLINK xFFFFFC001FE55050
*SC_PEND_BLINK xFFFFFC001FE55050
*SC_FREE_FLINK xFFFFFC001FE559B0
*SC_FREE_BLINK xFFFFFC001FE55848
SC_FREE_CMD_SLOTS x0000003D
********************************* ENTRY 4.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 198. ASTRO CONTROLLER
SEQUENCE NUMBER 1.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Oct 23 14:25:26 1996
OCCURRED ON SYSTEM alpha21
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
----- UNIT INFORMATION -----
CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x0000
----- CAM STRING -----
ROUTINE NAME xcrintr
----- CAM STRING -----
No interrupt bit set
----- CAM STRING -----
ERROR TYPE Hard Error Detected
----- CAM STRING -----
Controller Softc at time of
error
----- ENT_XCR_SOFTC -----
*SC_BUS_NAME xFFFFFC00006A20E0
SC_CNTRL_NUM x0000000000000000
SC_CNTRL_TYPE x006A2AC000000000
*SC_CTRL xFFFFFC00006A2AC0
SC_IOHANDLE x000003A000008000
SC_FLAGS x00000000
SC_REG_OFF x00000C90
SC_MAX_ACT x0000003C
SC_SPEC_ACT x00000004
SC_CMDS_ACT x00000001
*SC_ACT_FLINK xFFFFFC001FE556B8
*SC_ACT_BLINK xFFFFFC001FE556B8
SC_CMDS_PENDING x00000000
*SC_PEND_FLINK xFFFFFC001FE55050
*SC_PEND_BLINK xFFFFFC001FE55050
*SC_FREE_FLINK xFFFFFC001FE55938
*SC_FREE_BLINK xFFFFFC001FE55848
SC_FREE_CMD_SLOTS x0000003F
I know, it looks like an obvious hardware problem, but all have been
changed, controller, cables, connectors, etc, etc.
If someone knows about a simmilar problem and how to solve it, please
let me know. Of course, i'll summarize.
Regards
JAN
Received on Wed Oct 23 1996 - 21:43:13 NZDT