"bs_osf_complete: metadata write_failed"

From: Javier Aida <jaida_at_GMD.COM.PE>
Date: Wed, 23 Oct 1996 12:26:40 -0500

Hi,

We got an Alphaserver 2100 4/275 with KZPSC RAID controller. DU version
is 3.2F (Rev 69.73)
A 5-RZ28-disks RAID 5 level is accesed via the KZPSC controller
externally. This array contains 2 AdvFS filesets and the database
(Oracle) accesed through a raw device.

This installation has been working properly with OSF V.3.0B by around 2
years. Four or five months ago, it began crashing randomly (crashes
related to AdvFS). The first decision was to migrate to DU 3.2F.
Two weeks ago, 3.2F crashed with problems related also to AdvFS. A
segment of "messages" is below:


Oct 12 00:10:26 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3104
Oct 12 00:10:27 alpha21 vmunix: vd 1 blk 745840 blkCnt 128
Oct 12 00:10:27 alpha21 vmunix: write error = 5
Oct 12 00:10:28 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3552
Oct 12 00:10:28 alpha21 vmunix: vd 1 blk 1131616 blkCnt 128
Oct 12 00:10:28 alpha21 vmunix: read error = 5
Oct 12 00:10:28 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2967
Oct 12 00:10:29 alpha21 vmunix: vd 1 blk 2536128 blkCnt 128
Oct 12 00:10:29 alpha21 vmunix: write error = 5
Oct 12 00:10:29 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.fffffffe.0000 tag 0xfffffff7.0000u page 337
Oct 12 00:10:29 alpha21 vmunix: vd 1 blk 6496 blkCnt 80
Oct 12 00:10:29 alpha21 vmunix: write error = 5
Oct 12 00:10:29 alpha21 vmunix:
Oct 12 00:10:29 alpha21 vmunix: bs_osf_complete: metadata write failed
Oct 12 00:10:30 alpha21 vmunix: AdvFS Domain Panic; Domain local_domain
Id 0x2df0fa52.000d0bc0
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2983
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 2536384 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2975
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 2536256 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2959
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 2536000 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3112
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 745968 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3544
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 1131488 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: read error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001 tag 0x00003dd5.8001u page 2992
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 681424 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: read error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3120
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 746096 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3128
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 746224 blkCnt 128
Oct 12 00:10:30 alpha21 vmunix: write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000824.800eu page 3136
Oct 12 00:10:30 alpha21 vmunix: vd 1 blk 746352 blkCnt 128
Oct 12 00:10:31 alpha21 vmunix: write error = 5
Oct 12 00:10:31 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.fffffffe.0000 tag 0xfffffff7.0000u page 129
Oct 12 00:10:31 alpha21 vmunix: vd 1 blk 2432 blkCnt 16
Oct 12 00:10:31 alpha21 vmunix: write error = 5
Oct 12 00:10:31 alpha21 vmunix:
Oct 12 00:10:31 alpha21 vmunix: bs_osf_complete: metadata write failed
Oct 12 00:10:31 alpha21 vmunix: AdvFS Domain Panic; Domain home_domain
Id 0x2df0fa3f.0003d000
Oct 12 00:13:07 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000001.8001u page 426
Oct 12 00:13:07 alpha21 vmunix: vd 1 blk 1595936 blkCnt 96
Oct 12 00:13:07 alpha21 vmunix: read error = 5
Oct 12 00:13:07 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001 tag 0x00000006.8001u page 0
Oct 12 00:13:08 alpha21 vmunix: vd 1 blk 8720 blkCnt 16
Oct 12 00:13:08 alpha21 vmunix: read error = 5


Additionally, the "uerf" reports the following problems at the time of
the crash:

                                                  uerf version 4.2-011
(122)


********************************* ENTRY 1.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 300. SYSTEM STARTUP
SEQUENCE NUMBER 0.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Oct 23 14:37:07 1996
OCCURRED ON SYSTEM alpha21
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
MESSAGE PCXAL keyboard, language English
                                         _(American)
                                         
                                        Alpha boot: available memory
from
                                         _0x11dc000 to 0x1fffe000
                                        Digital UNIX V3.2F (Rev. 69.73);
Thu
                                         _Oct 10 19:18:19 GMT-0500 1996
                                        physical memory = 512.00
megabytes.
                                        available memory = 494.23
megabytes.
                                        using 1958 buffers containing
15.29
                                         _megabytes of memory
                                        Master cpu at slot 0.
                                        Firmware revision: 4.5
                                        PALcode: OSF version 1.45
                                        ibus0 at nexus
                                        AlphaServer 2100 4/275
                                        cpu 0 EV-45 4mb b-cache
                                        cpu 1 EV-45 4mb b-cache
                                        gpc0 at ibus0
                                        pci0 at ibus0 slot 0
                                        tu0: DECchip 21040-AA: Revision:
2.3
                                        tu0 at pci0 slot 0
                                        tu0: DEC TULIP Ethernet
Interface,
                                         _hardware address:
08-00-2B-E2-6A-42
                                        tu0: console mode: selecting UTP
                                         _(10BaseT) port: no link
                                        psiop0 at pci0 slot 1
                                        Loading SIOP: script 1001f00,
reg
                                         _81222000, data 100de20
                                        scsi0 at psiop0 slot 0
                                        rz0 at scsi0 bus 0 target 0 lun
0 (DEC
                                         _ RZ28 (C) DEC 442D)
                                        rz3 at scsi0 bus 0 target 3 lun
0 (DEC
                                         _ RZ28 (C) DEC D41C)
                                        rz6 at scsi0 bus 0 target 6 lun
0 (DEC
                                         _ RRD43 (C) DEC 1084)
                                        tz5 at scsi0 bus 0 target 5 lun
0 (DEC
                                         _ TLZ6 (C)DEC 0491)
                                        eisa0 at pci0
                                        ace0 at eisa0
                                        ace1 at eisa0
                                        lp0 at eisa0
                                        fdi0 at eisa0
                                        fd0 at fdi0 unit 0
                                        dns0 at eisa0
                                        dns0: Digital WAN Device Driver
                                         _Interface
                                        dns1: Digital WAN Device Driver
                                         _Interface
                                        dns1 at eisa0
                                        dns2: Digital WAN Device Driver
                                         _Interface
                                        dns3: Digital WAN Device Driver
                                         _Interface
                                        vga0 at eisa0
                                         1024x768 (QVision )
                                        fta0 DEC CRE DEFEA FDDI Module,
                                         _Hardware Revision 2
                                        fta0 at eisa0
                                        fta0: DMA Available.
                                        fta0: DEC CRE DEFEA (PDQ) FDDI
                                         _Interface, Hardware address:
                                         _08-00-2B-B7-27-FE
                                        fta0: Firmware rev: 2.46
                                        Initializing xcr0. Please wait.
                                        Initializing xcr0. Please wait.
                                        Initializing xcr0. Please wait.
                                        Initializing xcr0. Please wait.
                                        Initializing xcr0. Please wait.
                                        xcr0 at eisa0
                                        re0 at xcr0 unit 0 (unit status
=
                                         _ONLINE, raid level = 5)
                                        pza0 at pci0 slot 7
                                        pza0 firmware version: DEC P01
A10
                                         _
                                        scsi1 at pza0 slot 0
                                        pza1 at pci0 slot 8
                                        pza1 firmware version: DEC P01
A10
                                         _
                                        scsi2 at pza1 slot 0
                                        lvm0: configured.
                                        lvm1: configured.
                                        dli: configured
                                        SuperLAT. Copyright 1993
Meridian
                                         _Technology Corp. All rights
                                         _reserved.
                                        x25_access: configured
                                        wandd_base: configured
                                        wandd_lapb: configured
                                        wan_utilities: configured
                                        ctf_base: configured
                                        Node ID is 08-00-2b-b7-27-fe
(from
                                         _device fta0)
                                        dna_netman: configured
                                        dna_dli: configured

********************************* ENTRY 2.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 198. ASTRO CONTROLLER
SEQUENCE NUMBER 3.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Oct 23 14:25:53 1996
OCCURRED ON SYSTEM alpha21
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x0000

----- CAM STRING -----

ROUTINE NAME xcr_e_restart

----- CAM STRING -----

                                        Can't restart Controller

----- CAM STRING -----

ERROR TYPE Hard Error Detected

********************************* ENTRY 3.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 198. ASTRO CONTROLLER
SEQUENCE NUMBER 2.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Oct 23 14:25:47 1996
OCCURRED ON SYSTEM alpha21
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x0000

----- CAM STRING -----

ROUTINE NAME xcr_cmd_timeout

----- CAM STRING -----

                                        Controller has stopped
responding

----- CAM STRING -----

ERROR TYPE Hard Error Detected

----- CAM STRING -----

                                        Controller Softc at time of
error

----- ENT_XCR_SOFTC -----

*SC_BUS_NAME xFFFFFC00006A20E0
SC_CNTRL_NUM x0000000000000000
SC_CNTRL_TYPE x006A2AC000000000
*SC_CTRL xFFFFFC00006A2AC0
SC_IOHANDLE x000003A000008000
SC_FLAGS x00000002
SC_REG_OFF x00000C90
SC_MAX_ACT x0000003C
SC_SPEC_ACT x00000004
SC_CMDS_ACT x00000003
*SC_ACT_FLINK xFFFFFC001FE556B8
*SC_ACT_BLINK xFFFFFC001FE55A50
SC_CMDS_PENDING x00000000
*SC_PEND_FLINK xFFFFFC001FE55050
*SC_PEND_BLINK xFFFFFC001FE55050
*SC_FREE_FLINK xFFFFFC001FE559B0
*SC_FREE_BLINK xFFFFFC001FE55848
SC_FREE_CMD_SLOTS x0000003D

********************************* ENTRY 4.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 198. ASTRO CONTROLLER
SEQUENCE NUMBER 1.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Oct 23 14:25:26 1996
OCCURRED ON SYSTEM alpha21
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x0000

----- CAM STRING -----

ROUTINE NAME xcrintr

----- CAM STRING -----

                                        No interrupt bit set

----- CAM STRING -----

ERROR TYPE Hard Error Detected

----- CAM STRING -----

                                        Controller Softc at time of
error

----- ENT_XCR_SOFTC -----

*SC_BUS_NAME xFFFFFC00006A20E0
SC_CNTRL_NUM x0000000000000000
SC_CNTRL_TYPE x006A2AC000000000
*SC_CTRL xFFFFFC00006A2AC0
SC_IOHANDLE x000003A000008000
SC_FLAGS x00000000
SC_REG_OFF x00000C90
SC_MAX_ACT x0000003C
SC_SPEC_ACT x00000004
SC_CMDS_ACT x00000001
*SC_ACT_FLINK xFFFFFC001FE556B8
*SC_ACT_BLINK xFFFFFC001FE556B8
SC_CMDS_PENDING x00000000
*SC_PEND_FLINK xFFFFFC001FE55050
*SC_PEND_BLINK xFFFFFC001FE55050
*SC_FREE_FLINK xFFFFFC001FE55938
*SC_FREE_BLINK xFFFFFC001FE55848
SC_FREE_CMD_SLOTS x0000003F

I know, it looks like an obvious hardware problem, but all have been
changed, controller, cables, connectors, etc, etc.

If someone knows about a simmilar problem and how to solve it, please
let me know. Of course, i'll summarize.

Regards
JAN
Received on Wed Oct 23 1996 - 21:43:13 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT