Hi,
I have a test "rig" with various Alphastations, and have recently
upgraded some of them from 4.0D to 4.0F. Whilst in the process of doing
this, one particular box gave me problems, and would not boot off the
(firmware) CDROM. I tried creating a boot-floppy, but that didn't work
either - message was "Can not read from device dva0" (/dka? for CD) or
similar. I went ahead and did an installupdate to 4.0F, and pk3 but I'm
not too optimistic about the future of this box!
At the console level on power-up, you see:
Processor Detected - BCache single bit ECC error.
*** Unexpected interrupt through vector 0000067
IPRs:
EXC_ADDR: 000....12BA3C EXC_SUM: .........
ALCOR Error CSRs (CPU 0)
CIA_ERR:0000....0 ERR_STAT:....
warning - HWRPB is invalid.
I/O CSRs:
MEMORY BASE ADDRESS CSRs
MBA: 0008011
.....................
Processor Detected - Memory single bit ECC error.
IPRs:
EXC_ADDR: 000....12BA3C EXC_SUM: .........
etc. etc.
Processor correctable error through vector 00000063
EI_STAT: FFFFFFF484FFFFFF EI_ADDR: FFFFFF00000C5EAF
FILL_SYN: 00..........3100 ISR: 0000000100..0MCES4
Error on fill data from Main mem
data bit 14 530 bank 0
bad page in concole mem cluster [0]
This latter error continues with some variation during the boot, and
suprisingly the system does go to multi-user, and has been usable,
though today one of the developers told me they weren't able to use this
box at all.
I assume that this is what it says it is- and that it's either a CPU
and/or motherboard problem, but as it's probably going to be a fairly
expensive fix, I thought I'd run it past you lot anyway. I've tried
removing all the memory modules and trying them in different slots, but
get the same error.
Any suggestions?
BTW - it's a PB540-A9 system at firmware 6.8-2.
TIF,
Tony.
uerf messages follow -
********************************* ENTRY 103.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 7.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Nov 13 11:52:30 2000
OCCURRED ON SYSTEM cna1310
SYSTEM ID x0005000F
SYSTYPE x00000000
----- UNIT INFORMATION -----
UNIT CLASS CPU
********************************* ENTRY 104.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 6.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Nov 13 11:52:30 2000
OCCURRED ON SYSTEM cna1310
SYSTEM ID x0005000F
SYSTYPE x00000000
----- UNIT INFORMATION -----
UNIT CLASS CPU
********************************* ENTRY 105.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 300. SYSTEM STARTUP
SEQUENCE NUMBER 5.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Nov 13 11:48:02 2000
OCCURRED ON SYSTEM cna1310
SYSTEM ID x0005000F
SYSTYPE x00000000
MESSAGE Alpha boot: available memory
from
_0x9fc000 to 0x7f16000
Digital UNIX V4.0F (Rev. 1229);
Wed
_Oct 25 10:22:18 BST 2000
physical memory = 128.00
megabytes.
available memory = 117.57
megabytes.
using 483 buffers containing
3.77
_megabytes of memory
Firmware revision: 6.8
PALcode: Digital UNIX version
1.22
Digital AlphaStation 500/266
pci0 at nexus
tu0: DECchip 21040: Revision:
2.4
tu0 at pci0 slot 6
tu0: DEC TULIP (10Mbps) Ethernet
_Interface, hardware address:
_08-00-2B-E7-D2-50
tu0: console mode: selecting
10BaseT
_(UTP) port: half duplex
Machine Check error corrected by
_processor
Physical address of error
_ffffff0000549aaf Corrected ECC
Error
_in Memory during D-Cache fill
Fill Syndrome =
0000000000003100
Single Bit error in Quadword 1
at
_bit<14> in a Data bit
EI Address = ffffff0000549aaf
EI Status = fffffff0c4ffffff
Interrupt Status Reg =
_0000000100000000
ECC Syndrome = 0000000000000000
Memory Port 0 Status Reg =
_0000000000000000
Memory Port 1 Status Reg =
_0000000000000000
CIA Error Status =
0000000000000000
CIA Error Reg =
0000000000000000
WARNING: too many Processor
corrected
_errors detected on cpu 0.
Reporting
_suspended.
tga0 at pci0 slot 7
tga0: depth 8, map size 2MB,
1280x1024
tga0: ZLXp2-E, Revision: 34
isp0 at pci0 slot 9
isp0: QLOGIC ISP1020A
isp0: Firmware revision 5.54
(loaded
_by console)
scsi0 at isp0 slot 0
rz0 at scsi0 target 0 lun 0
(LID=0)
_(DEC RZ28D (C) DEC
0008)
_(Wide16)
rz1 at scsi0 target 1 lun 0
(LID=1)
_(DEC RZ28D (C) DEC
0008)
_(Wide16)
rz4 at scsi0 target 4 lun 0
(LID=2)
_(DEC RRD45 (C) DEC
1645)
eisa0 at pci0
ace0 at eisa0
ace1 at eisa0
lp0 at eisa0
fdi0 at eisa0
gpc0 at eisa0
lvm0: configured.
lvm1: configured.
kernel console: ace0
Machine Check error corrected by
_processor
Physical address of error
_ffffff00019d9caf Corrected ECC
Error
_in B-Cache during D-Cache fill
Fill Syndrome =
0000000000003100
Single Bit error in Quadword 1
at
_bit<14> in a Data bit
EI Address = ffffff00019d9caf
EI Status = fffffff084ffffff
Interrupt Status Reg =
_0000000100000000
ECC Syndrome = 0000000000000000
Memory Port 0 Status Reg =
_0000000000000000
Memory Port 1 Status Reg =
_0000000000000000
CIA Error Status =
0000000000000000
CIA Error Reg =
0000000000000000
Machine Check error corrected by
_processor
Physical address of error
_ffffff0000439aaf Corrected ECC
Error
_in B-Cache during D-Cache fill
Fill Syndrome =
0000000000003100
Single Bit error in Quadword 1
at
_bit<14> in a Data bit
EI Address = ffffff0000439aaf
EI Status = fffffff484ffffff
Interrupt Status Reg =
_0000000100400000
ECC Syndrome = 0000000000000000
Memory Port 0 Status Reg =
_0000000000000000
Memory Port 1 Status Reg =
_0000000000000000
CIA Error Status =
0000000000000000
CIA Error Reg =
0000000000000000
Machine Check error corrected by
_processor
Physical address of error
_ffffff00019d9aaf Corrected ECC
Error
_in B-Cache during D-Cache fill
Fill Syndrome =
0000000000003100
Single Bit error in Quadword 1
at
_bit<14> in a Data bit
EI Address = ffffff00019d9aaf
EI Status = fffffff084ffffff
Interrupt Status Reg =
_0000000100000000
ECC Syndrome = 0000000000000000
Memory Port 0 Status Reg =
_0000000000000000
Memory Port 1 Status Reg =
_0000000000000000
CIA Error Status =
0000000000000000
CIA Error Reg =
0000000000000000
Machine Check error corrected by
_processor
Physical address of error
_ffffff0000439aaf Corrected ECC
Error
_in B-Cache during D-Cache fill
Fill Syndrome =
0000000000003100
Single Bit error in Quadword 1
at
_bit<14> in a Data bit
EI Address = ffffff0000439aaf
EI Status = fffffff484ffffff
Interrupt Status Reg =
_0000000100400000
ECC Syndrome = 0000000000000000
Memory Port 0 Status Reg =
_0000000000000000
Memory Port 1 Status Reg =
_0000000000000000
CIA Error Status =
0000000000000000
CIA Error Reg =
0000000000000000
WARNING: too many Processor
corrected
_errors detected on cpu 0.
Reporting
_suspended.
dli: configured
********************************* ENTRY 106.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 4.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Nov 13 11:48:02 2000
OCCURRED ON SYSTEM cna1310
SYSTEM ID x0005000F
SYSTYPE x00000000
----- UNIT INFORMATION -----
UNIT CLASS CPU
--
---------------------oooOOOooo---------------------
Tony McElhill
Development Support Engineer
Airsys ATM
Oakcroft Road, Chessington, Surrey KT9 1QZ England.
Tel: 020-8391-6438
Fax: 020-8391-6137
e-mail: tony.mcelhill_at_uk.airsysatm.thomson-csf.com
---------------------oooOOOooo---------------------
Received on Mon Nov 13 2000 - 15:07:05 NZDT