Alphastation 500 5/266 CPU exeptions.

From: Tony McElhill <tony.mcelhill_at_uk.airsysatm.thomson-csf.com>
Date: Mon, 13 Nov 2000 14:47:20 +0000

Hi,

I have a test "rig" with various Alphastations, and have recently
upgraded some of them from 4.0D to 4.0F. Whilst in the process of doing
this, one particular box gave me problems, and would not boot off the
(firmware) CDROM. I tried creating a boot-floppy, but that didn't work
either - message was "Can not read from device dva0" (/dka? for CD) or
similar. I went ahead and did an installupdate to 4.0F, and pk3 but I'm
not too optimistic about the future of this box!

At the console level on power-up, you see:

Processor Detected - BCache single bit ECC error.
*** Unexpected interrupt through vector 0000067
IPRs:
EXC_ADDR: 000....12BA3C EXC_SUM: .........
ALCOR Error CSRs (CPU 0)
CIA_ERR:0000....0 ERR_STAT:....
warning - HWRPB is invalid.
I/O CSRs:
MEMORY BASE ADDRESS CSRs
MBA: 0008011
.....................
Processor Detected - Memory single bit ECC error.
IPRs:
EXC_ADDR: 000....12BA3C EXC_SUM: .........
etc. etc.

Processor correctable error through vector 00000063
EI_STAT: FFFFFFF484FFFFFF EI_ADDR: FFFFFF00000C5EAF
FILL_SYN: 00..........3100 ISR: 0000000100..0MCES4
Error on fill data from Main mem
data bit 14 530 bank 0
bad page in concole mem cluster [0]

This latter error continues with some variation during the boot, and
suprisingly the system does go to multi-user, and has been usable,
though today one of the developers told me they weren't able to use this
box at all.
I assume that this is what it says it is- and that it's either a CPU
and/or motherboard problem, but as it's probably going to be a fairly
expensive fix, I thought I'd run it past you lot anyway. I've tried
removing all the memory modules and trying them in different slots, but
get the same error.

Any suggestions?

BTW - it's a PB540-A9 system at firmware 6.8-2.


TIF,

Tony.

uerf messages follow -

********************************* ENTRY 103.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 7.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Nov 13 11:52:30 2000
OCCURRED ON SYSTEM cna1310
SYSTEM ID x0005000F
SYSTYPE x00000000

----- UNIT INFORMATION -----

UNIT CLASS CPU

********************************* ENTRY 104.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 6.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Nov 13 11:52:30 2000
OCCURRED ON SYSTEM cna1310
SYSTEM ID x0005000F
SYSTYPE x00000000

----- UNIT INFORMATION -----

UNIT CLASS CPU

********************************* ENTRY 105.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 300. SYSTEM STARTUP
SEQUENCE NUMBER 5.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Nov 13 11:48:02 2000
OCCURRED ON SYSTEM cna1310
SYSTEM ID x0005000F
SYSTYPE x00000000
MESSAGE Alpha boot: available memory
from
                                         _0x9fc000 to 0x7f16000
                                        Digital UNIX V4.0F (Rev. 1229);
Wed
                                         _Oct 25 10:22:18 BST 2000
                                        physical memory = 128.00
megabytes.
                                        available memory = 117.57
megabytes.
                                        using 483 buffers containing
3.77
                                         _megabytes of memory
                                        Firmware revision: 6.8
                                        PALcode: Digital UNIX version
1.22
                                        Digital AlphaStation 500/266
                                        pci0 at nexus
                                        tu0: DECchip 21040: Revision:
2.4
                                        tu0 at pci0 slot 6
                                        tu0: DEC TULIP (10Mbps) Ethernet

                                         _Interface, hardware address:
                                         _08-00-2B-E7-D2-50
                                        tu0: console mode: selecting
10BaseT
                                         _(UTP) port: half duplex
                                        Machine Check error corrected by

                                         _processor
                                        Physical address of error
                                         _ffffff0000549aaf Corrected ECC
Error
                                         _in Memory during D-Cache fill
                                         Fill Syndrome =
0000000000003100
                                        Single Bit error in Quadword 1
at
                                         _bit<14> in a Data bit
                                         EI Address = ffffff0000549aaf
                                         EI Status = fffffff0c4ffffff
                                         Interrupt Status Reg =
                                         _0000000100000000
                                         ECC Syndrome = 0000000000000000

                                         Memory Port 0 Status Reg =
                                         _0000000000000000
                                         Memory Port 1 Status Reg =
                                         _0000000000000000
                                         CIA Error Status =
0000000000000000
                                         CIA Error Reg =
0000000000000000
                                        WARNING: too many Processor
corrected
                                         _errors detected on cpu 0.
Reporting
                                         _suspended.
                                        tga0 at pci0 slot 7
                                        tga0: depth 8, map size 2MB,
1280x1024
                                        tga0: ZLXp2-E, Revision: 34
                                        isp0 at pci0 slot 9
                                        isp0: QLOGIC ISP1020A
                                        isp0: Firmware revision 5.54
(loaded
                                         _by console)
                                        scsi0 at isp0 slot 0
                                        rz0 at scsi0 target 0 lun 0
(LID=0)
                                         _(DEC RZ28D (C) DEC
0008)
                                         _(Wide16)
                                        rz1 at scsi0 target 1 lun 0
(LID=1)
                                         _(DEC RZ28D (C) DEC
0008)
                                         _(Wide16)
                                        rz4 at scsi0 target 4 lun 0
(LID=2)
                                         _(DEC RRD45 (C) DEC
1645)
                                        eisa0 at pci0
                                        ace0 at eisa0
                                        ace1 at eisa0
                                        lp0 at eisa0
                                        fdi0 at eisa0
                                        gpc0 at eisa0
                                        lvm0: configured.
                                        lvm1: configured.
                                        kernel console: ace0
                                        Machine Check error corrected by

                                         _processor
                                        Physical address of error
                                         _ffffff00019d9caf Corrected ECC
Error
                                         _in B-Cache during D-Cache fill

                                         Fill Syndrome =
0000000000003100
                                        Single Bit error in Quadword 1
at
                                         _bit<14> in a Data bit
                                         EI Address = ffffff00019d9caf
                                         EI Status = fffffff084ffffff
                                         Interrupt Status Reg =
                                         _0000000100000000
                                         ECC Syndrome = 0000000000000000

                                         Memory Port 0 Status Reg =
                                         _0000000000000000
                                         Memory Port 1 Status Reg =
                                         _0000000000000000
                                         CIA Error Status =
0000000000000000
                                         CIA Error Reg =
0000000000000000
                                        Machine Check error corrected by

                                         _processor
                                        Physical address of error
                                         _ffffff0000439aaf Corrected ECC
Error
                                         _in B-Cache during D-Cache fill

                                         Fill Syndrome =
0000000000003100
                                        Single Bit error in Quadword 1
at
                                         _bit<14> in a Data bit
                                         EI Address = ffffff0000439aaf
                                         EI Status = fffffff484ffffff
                                         Interrupt Status Reg =
                                         _0000000100400000
                                         ECC Syndrome = 0000000000000000

                                         Memory Port 0 Status Reg =
                                         _0000000000000000
                                         Memory Port 1 Status Reg =
                                         _0000000000000000
                                         CIA Error Status =
0000000000000000
                                         CIA Error Reg =
0000000000000000
                                        Machine Check error corrected by

                                         _processor
                                        Physical address of error
                                         _ffffff00019d9aaf Corrected ECC
Error
                                         _in B-Cache during D-Cache fill

                                         Fill Syndrome =
0000000000003100
                                        Single Bit error in Quadword 1
at
                                         _bit<14> in a Data bit
                                         EI Address = ffffff00019d9aaf
                                         EI Status = fffffff084ffffff
                                         Interrupt Status Reg =
                                         _0000000100000000
                                         ECC Syndrome = 0000000000000000

                                         Memory Port 0 Status Reg =
                                         _0000000000000000
                                         Memory Port 1 Status Reg =
                                         _0000000000000000
                                         CIA Error Status =
0000000000000000
                                         CIA Error Reg =
0000000000000000
                                        Machine Check error corrected by

                                         _processor
                                        Physical address of error
                                         _ffffff0000439aaf Corrected ECC
Error
                                         _in B-Cache during D-Cache fill

                                         Fill Syndrome =
0000000000003100
                                        Single Bit error in Quadword 1
at
                                         _bit<14> in a Data bit
                                         EI Address = ffffff0000439aaf
                                         EI Status = fffffff484ffffff
                                         Interrupt Status Reg =
                                         _0000000100400000
                                         ECC Syndrome = 0000000000000000

                                         Memory Port 0 Status Reg =
                                         _0000000000000000
                                         Memory Port 1 Status Reg =
                                         _0000000000000000
                                         CIA Error Status =
0000000000000000
                                         CIA Error Reg =
0000000000000000
                                        WARNING: too many Processor
corrected
                                         _errors detected on cpu 0.
Reporting
                                         _suspended.
                                        dli: configured

********************************* ENTRY 106.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 4.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Nov 13 11:48:02 2000
OCCURRED ON SYSTEM cna1310
SYSTEM ID x0005000F
SYSTYPE x00000000

----- UNIT INFORMATION -----

UNIT CLASS CPU

--
  ---------------------oooOOOooo---------------------
  Tony McElhill
  Development Support Engineer
  Airsys ATM
  Oakcroft Road, Chessington, Surrey KT9 1QZ England.
  Tel: 020-8391-6438
  Fax: 020-8391-6137
  e-mail: tony.mcelhill_at_uk.airsysatm.thomson-csf.com
  ---------------------oooOOOooo---------------------
Received on Mon Nov 13 2000 - 15:07:05 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT