Greetings-
I have seen several recent postings dealing with the problem of
cpu exception logging due from ECC errors. I have three
AlphaStation 500/266's which have logged a large number of these
sorts of errors.
The following is an example of such an error message:
> dia -R -icpu -o full
DECevent V2.1
******************************** ENTRY 1 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 4.
Timestamp of occurrence 27-MAR-1997 11:48:43
Host name boninsegna
System type register x0000000F Alcor
Number of CPUs (mpnum) x00000001
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 3. Bcache error (630 entry)
Flags: x80000000 Retryable Error
Mchk Error Code x0000000000000086
EV5 Detected Corr ECC Error
EI ADDR xFFFFFF0004BC0CEF
FILL SYNDROME x0000000000000068
EI STATUS xFFFFFFF0C4FFFFFF
Error occurred during D-ref fill
ISR x0000000100000000
Correctable ECC errors (IPL31)
AST requests 3 - 0 x0000000000000000
CIA Syndrome x0000000000000000
ECC Syndrome x0000000000000000
MEM ERR0 x0000000000000000
Memory Port Address x0000000000000000
MEM ERR1 x0000000000000000
Bits <33:32> of Memory Po x0000000000000000
Bit <39> of Memory Port x0000000000000000
Memory Command x0000000000000000
Mask When Err Occurred x0000000000000000
Mem Seq State Idle
EV5 Resp. for DMA: No Response
CIA ERR x0000000000000000
Taking the advice of a previous posting, I brought the machine
down to the console level (upgraded firware to as500_v6_4.exe
(v3_9 cdrom) and did the following:
>>> set d_group field
>>> memory
>>> showit
Then reams of test output went by. This machine has 512MB of memory
(all Dataram). First before I show a sample error message, how many
passes is this supposed to do? Is this in the infinite loop mode, or
does it does write and read a multiple of the main memory?
In between outputting the test summary, I was treated to a blur of
messages similar to (there were a few different EI_ADDR values)
the following (this is a hand-scribbled note):
Processor correctable error through vector 00000063
EI_STAT: FFFFFFF0C4FFFFFF EI_ADDR: FFFFFF0004Bxxxx
FILL_SYN: 0000000000000068 ISR: 0000000100000000 MCES 4
databit 59 J26 bank0
page# 2561 base 9e
HELP! Is this indicative of a bad B-cache (I am assuming that is
the tertiary 2MB cache)? or is one the DIMMS bad. I need to know
whether this is a DEC matter or a Dataram matter, so that i can resolve
this without finger-pointing back-and-forth. I imagine that all fo
this logging interferes with the speed of the system and eats up
disk space.
As a further question, when i bring these machines up after they have
been powere off, show config says that the tested memory is only 33MB.
Is this common? correct?
Thanks so much. Any insight on this matter would be greatly appreciated.
Sean
*************************************************************************
* Sean O'Connell *
* Computer Projects Manager *
* Duke University Institute of Statistics and Decision Sciences *
*************************************************************************
* Phone: (919) 684-5419 *
* Fax: (919) 684-8594 *
* Email: sto_at_stat.Duke.EDU *
* Mail: 220 Old Chemistry Building *
* P.O. Box 90251 *
* Durham NC 27708-0251 *
*************************************************************************
Received on Thu Mar 27 1997 - 23:20:40 NZST