Hardware error. I need advice.

From: Peter Chapin <pchapin_at_twilight.vtc.vsc.edu>
Date: Tue, 14 Oct 1997 12:14:03 -0400 (EDT)

Recently my system (DU4.0B unpatched) crashed with some sort of hardware
error. I am not sure exactly how to interpret the log files and error
messages and am seeking some advice. It looks to me as if there might be a
problem with either the SCSI adapter or the (single) hard disk. Here are
the particulars:

The system apparently crashed and tried to restart itself. However, it was
unable to boot and stopped part way through the boot-up sequence. On the
console was the message

        CAM logger, CAM error packet, bus 0, target 0, lun 0

This message was repeated three times. Then there was

        Reached maximum abort count. Schedule bus reset.

However the system did not appear to go any further in the boot process.

I'm not certain about the exact format and placement of the above
messages. They were read to me over the phone while I was at home.
However, that was the gist of them. I told my backup adminstrator to cycle
the power on the system. The system came up normally (apparently) and is
now running fine.

I then checked the syslog (Kernel facility). I found the following. Note
that 17:53 was when the successful boot occured.

Oct 13 17:53:42 twilight vmunix: Alpha PC machine check type 0x660.
Oct 13 17:53:42 twilight vmunix: Machine check abort
Oct 13 17:53:42 twilight vmunix: retry = 0xffffffff
Oct 13 17:53:42 twilight vmunix: mchk_code = 0x207
Oct 13 17:53:42 twilight vmunix: paltemp[1] = 0x0
Oct 13 17:53:42 twilight vmunix: paltemp[2] = 0x4
Oct 13 17:53:42 twilight vmunix: paltemp[3] = 0x0
Oct 13 17:53:42 twilight vmunix: paltemp[4] = 0x3fca000
Oct 13 17:53:42 twilight vmunix: paltemp[5] = 0x0
Oct 13 17:53:42 twilight vmunix: paltemp[6] = 0x2a7740
Oct 13 17:53:42 twilight vmunix: paltemp[7] = 0x4200
Oct 13 17:53:43 twilight vmunix: paltemp[8] = 0x400
Oct 13 17:53:43 twilight vmunix: paltemp[9] = 0x0
Oct 13 17:53:43 twilight vmunix: paltemp[10] = 0x3c9ce0
Oct 13 17:53:43 twilight vmunix: paltemp[11] = 0x0
Oct 13 17:53:43 twilight vmunix: paltemp[12] = 0x3ca080
Oct 13 17:53:43 twilight vmunix: paltemp[13] = 0x3ca0b0
Oct 13 17:53:43 twilight vmunix: paltemp[14] = 0x3ca110
Oct 13 17:53:43 twilight vmunix: paltemp[15] = 0x3c9e80
Oct 13 17:53:43 twilight vmunix: paltemp[16] = 0x3c9b50
Oct 13 17:53:43 twilight vmunix: paltemp[17] = 0x84048000
Oct 13 17:53:43 twilight vmunix: paltemp[18] = 0x0
Oct 13 17:53:43 twilight vmunix: paltemp[19] = 0x8404b9d8
Oct 13 17:53:43 twilight vmunix: paltemp[20] = 0x4fa740
Oct 13 17:53:43 twilight vmunix: paltemp[21] = 0x4000a020
Oct 13 17:53:43 twilight vmunix: paltemp[22] = 0x727a7a7a
Oct 13 17:53:43 twilight vmunix: paltemp[23] = 0xc0183508
Oct 13 17:53:44 twilight vmunix: paltemp[24] = 0x0
Oct 13 17:53:44 twilight vmunix: paltemp[25] = 0x10000
Oct 13 17:53:44 twilight vmunix: paltemp[26] = 0x0
Oct 13 17:53:44 twilight vmunix: paltemp[27] = 0x0
Oct 13 17:53:44 twilight vmunix: paltemp[28] = 0x56c000
Oct 13 17:53:44 twilight vmunix: paltemp[29] = 0x0
Oct 13 17:53:44 twilight vmunix: paltemp[30] = 0x1
Oct 13 17:53:44 twilight vmunix: paltemp[31] = 0x3f21a38
Oct 13 17:53:44 twilight vmunix: exc_addr = 0x285a70
Oct 13 17:53:44 twilight vmunix: exc_sum = 0x0
Oct 13 17:53:44 twilight vmunix: msk = 0x0
Oct 13 17:53:44 twilight vmunix: pal_base = 0x14000
Oct 13 17:53:44 twilight vmunix: hirr = 0x1402
Oct 13 17:53:44 twilight vmunix: hier = 0x14f0
Oct 13 17:53:44 twilight vmunix: mm_csr = 0x3640
Oct 13 17:53:44 twilight vmunix: va = 0x6170
Oct 13 17:53:45 twilight vmunix: biu_addr = 0x60e0
Oct 13 17:53:45 twilight vmunix: biu_stat = 0x50
Oct 13 17:53:45 twilight vmunix: dc_addr = 0xffffffff
Oct 13 17:53:45 twilight vmunix: fill_adr = 0x6100
Oct 13 17:53:45 twilight vmunix: dc_stat = 0x3
Oct 13 17:53:45 twilight vmunix: fill_syndrome = 0x0
Oct 13 17:53:45 twilight vmunix: bc_tag = 0x16502e50
Oct 13 17:53:45 twilight vmunix: coma_gcr = 0x7fb200a4
Oct 13 17:53:45 twilight vmunix: coma_edsr = 0x7fb2a140
Oct 13 17:53:45 twilight vmunix: coma_ter = 0x6fb1fff8
Oct 13 17:53:45 twilight vmunix: coma_elar = 0x6fb10000
Oct 13 17:53:45 twilight vmunix: coma_ehar = 0x6fb10800
Oct 13 17:53:45 twilight vmunix: coma_ldlr = 0x6fb18bf3
Oct 13 17:53:45 twilight vmunix: coma_ldhr = 0x6fb10003
Oct 13 17:53:45 twilight vmunix: coma_base0 = 0x6fb10000
Oct 13 17:53:45 twilight vmunix: coma_base1 = 0x6fb10080
Oct 13 17:53:46 twilight vmunix: coma_base2 = 0x47ff0000
Oct 13 17:53:46 twilight vmunix: coma_cnfg0 = 0x47ff004b
Oct 13 17:53:46 twilight vmunix: coma_cnfg1 = 0x47ff004b
Oct 13 17:53:46 twilight vmunix: coma_cnfg2 = 0x47ff0000
Oct 13 17:53:46 twilight vmunix: epic_dcsr = 0x8008201d
Oct 13 17:53:46 twilight vmunix: epic_pear = 0x816000
Oct 13 17:53:46 twilight vmunix: epic_sear = 0xb1d7f0
Oct 13 17:53:46 twilight vmunix: epic_tbr1 = 0x338000
Oct 13 17:53:46 twilight vmunix: epic_tbr2 = 0x0
Oct 13 17:53:46 twilight vmunix: epic_pbr1 = 0x8c0000
Oct 13 17:53:46 twilight vmunix: epic_pbr2 = 0x40080000
Oct 13 17:53:46 twilight vmunix: epic_pmr1 = 0x700000
Oct 13 17:53:46 twilight vmunix: epic_pmr2 = 0x3ff00000
Oct 13 17:53:46 twilight vmunix: epic_harx1 = 0x80000000
Oct 13 17:53:46 twilight vmunix: epic_harx2 = 0x0
Oct 13 17:53:46 twilight vmunix: epic_pmlt = 0xff
Oct 13 17:53:46 twilight vmunix: epic_tag0 = 0x803000
Oct 13 17:53:46 twilight vmunix: epic_tag1 = 0x801000
Oct 13 17:53:47 twilight vmunix: epic_tag2 = 0x807000
Oct 13 17:53:47 twilight vmunix: epic_tag3 = 0x805000
Oct 13 17:53:47 twilight vmunix: epic_tag4 = 0x813000
Oct 13 17:53:47 twilight vmunix: epic_tag5 = 0x815000
Oct 13 17:53:47 twilight vmunix: epic_tag6 = 0x817000
Oct 13 17:53:47 twilight vmunix: epic_tag7 = 0x802000
Oct 13 17:53:47 twilight vmunix: epic_data0 = 0x51c
Oct 13 17:53:47 twilight vmunix: epic_data1 = 0x51a
Oct 13 17:53:47 twilight vmunix: epic_data2 = 0x520
Oct 13 17:53:47 twilight vmunix: epic_data3 = 0x51e
Oct 13 17:53:47 twilight vmunix: epic_data4 = 0x19d2
Oct 13 17:53:47 twilight vmunix: epic_data5 = 0x31ba
Oct 13 17:53:47 twilight vmunix: epic_data6 = 0x2c74
Oct 13 17:53:47 twilight vmunix: epic_data7 = 0x51c
Oct 13 17:53:47 twilight vmunix: panic (cpu 0): Machine check - Hardware error
Oct 13 17:53:47 twilight vmunix: SIOP 0:device string for dump = SCSI 0 6 0 0 0 0 0.
Oct 13 17:53:47 twilight vmunix: DUMP.prom: dev SCSI 0 6 0 0 0 0 0, block 262144
Oct 13 17:53:47 twilight vmunix: device string for dump = SCSI 0 6 0 0 0 0 0.
Oct 13 17:53:47 twilight vmunix: DUMP.prom: dev SCSI 0 6 0 0 0 0 0, block 262144
Oct 13 17:53:48 twilight vmunix: Alpha boot: available memory from 0x66c000 to 0x3ffe000
Oct 13 17:53:48 twilight vmunix: Digital UNIX V4.0B (Rev. 564); Thu Sep 18 23:10:09 EDT 1997
Oct 13 17:53:48 twilight vmunix: physical memory = 64.00 megabytes.
Oct 13 17:53:48 twilight vmunix: available memory = 57.59 megabytes.
Oct 13 17:53:48 twilight vmunix: using 238 buffers containing 1.85 megabytes of memory
Oct 13 17:53:48 twilight vmunix: AlphaStation 400 4/233 system
Oct 13 17:53:48 twilight vmunix: DECchip 21071
Oct 13 17:53:48 twilight vmunix: 82378IB (SIO) PCI/ISA Bridge
Oct 13 17:53:48 twilight vmunix: Firmware revision: 6.4
Oct 13 17:53:48 twilight vmunix: PALcode: OSF version 1.46
Oct 13 17:53:48 twilight vmunix: pci0 at nexus
Oct 13 17:53:48 twilight vmunix: psiop0 at pci0 slot 6
Oct 13 17:53:49 twilight vmunix: Loading SIOP: script 800a00, reg 82810000, data 405229f0
Oct 13 17:53:49 twilight vmunix: scsi0 at psiop0 slot 0
Oct 13 17:53:49 twilight vmunix: rz0 at scsi0 target 0 lun 0 (LID=0) (DEC RZ28M (C) DEC 0568)
Oct 13 17:53:49 twilight vmunix: rz4 at scsi0 target 4 lun 0 (LID=1) (DEC RRD45 (C) DEC 0436)
Oct 13 17:53:49 twilight vmunix: tz5 at scsi0 target 5 lun 0 (LID=2) (DEC TZK11 (C) DEC 00A2)
Oct 13 17:53:49 twilight vmunix: isa0 at pci0
Oct 13 17:53:49 twilight vmunix: gpc0 at isa0
Oct 13 17:53:49 twilight vmunix: ace0 at isa0
Oct 13 17:53:50 twilight vmunix: ace1 at isa0
Oct 13 17:53:50 twilight vmunix: lp0 at isa0
Oct 13 17:53:50 twilight vmunix: fdi0 at isa0
Oct 13 17:53:50 twilight vmunix: fd0 at fdi0 unit 0
Oct 13 17:53:50 twilight vmunix: trio0 at pci0 slot 11
Oct 13 17:53:50 twilight vmunix: trio0: S3 Trio64 (SVGA) - Plug N' Play - 1.0 Mb
Oct 13 17:53:50 twilight vmunix: tu0: DECchip 21040-AA: Revision: 2.3
Oct 13 17:53:50 twilight vmunix: tu0 at pci0 slot 13
Oct 13 17:53:50 twilight vmunix: tu0: DEC TULIP Ethernet Interface, hardware address: 08-00-2B-E6-83-64
Oct 13 17:53:50 twilight vmunix: tu0: console mode: selecting 10Base2 (BNC) port
Oct 13 17:53:50 twilight vmunix: lvm0: configured.
Oct 13 17:53:50 twilight vmunix: lvm1: configured.
Oct 13 17:53:50 twilight vmunix: kernel console: trio0
Oct 13 17:53:50 twilight vmunix: dli: configured
Oct 13 17:53:50 twilight vmunix: vm_swap_init: warning /sbin/swapdefault swap device not found
Oct 13 17:53:50 twilight vmunix: vm_swap_init: swap is set to lazy (over commitment) mode
Oct 13 17:54:07 twilight vmunix: SuperLAT. Copyright 1994 Meridian Technology Corp. All rights reserved.

I then used uerf to check binary.errlog. I found the following. I assume
the system crashed and restarted at 15:27 as indicated below.

********************************* ENTRY 9. *********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 1.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Oct 13 15:27:11 1997
OCCURRED ON SYSTEM twilight
SYSTEM ID x0006000D CPU TYPE: DEC 7000
SYSTYPE x00000000

----- UNIT INFORMATION -----

UNIT CLASS CPU

********************************* ENTRY 10. *********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 2.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Oct 13 15:27:13 1997
OCCURRED ON SYSTEM twilight
SYSTEM ID x0006000D CPU TYPE: DEC 7000
SYSTYPE x00000000
MESSAGE panic (cpu 0): Machine check -
                                         _Hardware error

********************************* ENTRY 11. *********************************

----- EVENT INFORMATION -----

EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 300. SYSTEM STARTUP
SEQUENCE NUMBER 0.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Oct 13 17:53:41 1997
OCCURRED ON SYSTEM twilight
SYSTEM ID x0006000D CPU TYPE: DEC 7000
SYSTYPE x00000000
MESSAGE Alpha boot: available memory from
                                         _0x66c000 to 0x3ffe000
                                        Digital UNIX V4.0B (Rev. 564); Thu
                                         _Sep 18 23:10:09 EDT 1997
                                        physical memory = 64.00 megabytes.
                                        available memory = 57.59 megabytes.
                                        using 238 buffers containing 1.85
                                         _megabytes of memory
                                        AlphaStation 400 4/233 system
                                        DECchip 21071
                                        82378IB (SIO) PCI/ISA Bridge
                                        Firmware revision: 6.4
                                        PALcode: OSF version 1.46
                                        pci0 at nexus
                                        psiop0 at pci0 slot 6
                                        Loading SIOP: script 800a00, reg
                                         _82810000, data 405229f0
                                        scsi0 at psiop0 slot 0
                                        rz0 at scsi0 target 0 lun 0 (LID=0)
                                         _(DEC RZ28M (C) DEC 0568)
                                        rz4 at scsi0 target 4 lun 0 (LID=1)
                                         _(DEC RRD45 (C) DEC 0436)
                                        tz5 at scsi0 target 5 lun 0 (LID=2)
                                         _(DEC TZK11 (C) DEC 00A2)
                                        isa0 at pci0
                                        gpc0 at isa0
                                        ace0 at isa0
                                        ace1 at isa0
                                        lp0 at isa0
                                        fdi0 at isa0
                                        fd0 at fdi0 unit 0
                                        trio0 at pci0 slot 11
                                        trio0: S3 Trio64 (SVGA) - Plug N' Play
                                         _- 1.0 Mb
                                        tu0: DECchip 21040-AA: Revision: 2.3
                                        tu0 at pci0 slot 13
                                        tu0: DEC TULIP Ethernet Interface,
                                         _hardware address: 08-00-2B-E6-83-64
                                        tu0: console mode: selecting 10Base2
                                         _(BNC) port
                                        lvm0: configured.
                                        lvm1: configured.
                                        kernel console: trio0
                                        dli: configured


I have a vmcore.0 file in /var/adm/crash. Can anyone give me some advice
about what all this means exactly? Is it my disk or my SCSI adapter that
is on the blink? How serious is this? The system seems to be running fine
right now. Should I worry?

*****************************************************************************
Peter http://twilight.vtc.vsc.edu/~pchapin
pchapin_at_twilight.vtc.vsc.edu Paganism: Ancient beliefs in a modern world
Received on Wed Oct 15 1997 - 10:24:56 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT