Wow, this list is awsome. The mere posting of our memory problems
on this list frightened our 1000's into ceasing to produce their error
messages. We have had no problems at all since my original post!
Thanks to Robert McMillin (rlm_at_syseca-us.com), Kurt Carlson
(sxkac_at_java.sois.alaska.edu), and Bob Safran (bobs_at_slmd.com).
Robert's suggestion:
One question: how closely did you follow the RAM installation
instructions? I ask because the sticks we got from Kensington on our
1000A's came in multiples of THREE, which I immediately thought was
wierd until I found out that the system uses the third stick for error
correction (ECC). Perhaps these aren't in the right slot.
I still plan to check this out.
I'm embarrassed to note that even after Kurt pointed out that these errors
would show up as a cpu exception in uerf, I did not see it. They
really are there in uerf as cpu exceptions. (Yowee, is that a worthless
message. The entire content of the message is "CPU Exception"). And
special thanks to Kurt for the uerf filtering/summarization utilities.
(We're stuck with some 3.0 machines for awhile, so decevent is
not yet for us.)
My original posting follows:
==============================================================================
Hi, admins.
We have 2 brand-spankin-new Alpha 1000A's with 128mb DEC memory.
We added 128mb 3rd party (Simple) memory to each system for a total of 256mb.
We received the error message "Machine Check error/
Corrected ECC Error in Memory during D-Cache fill" 3 times on each system.
(On 1 system, the 3 errors occurred within a few seconds of each other. On the
other system, 2 errors occurred 15 minutes apart, and a 3rd error on a
different day.) I don't know if the errors are coming from the DEC
memory or the 3rd party memory.
uerf shows no errors. (Because they're correctable????) The error msgs
are occurring in the syslog.dated files in kern.log
Sure seems strange we're seeing the same message on 2 different systems.
Any guess as to the severity of these errors? Is it time to call in
hardware support?
Any ideas greatly appreciated. Full text of the message is shown below.
Feb 6 09:42:32 xanth vmunix: Machine Check error corrected by processor
Feb 6 09:42:32 xanth vmunix: Physical address of error ffffff000cfcb20f
Corrected ECC Error in Memory during D-Cache fill
Feb 6 09:42:32 xanth vmunix: Fill Syndrome = 00000000000000a4
Feb 6 09:42:32 xanth vmunix: Single Bit error in Quadword 0 at bit<41>
in a Data bit
Feb 6 09:42:32 xanth vmunix: EI Address = ffffff000cfcb20f
Feb 6 09:42:32 xanth vmunix: EI Status = fffffff0c1ffffff
Feb 6 09:42:32 xanth vmunix: Interrupt Status Reg = 0000000100000000
Feb 6 09:42:32 xanth vmunix: ECC Syndrome = 0000000000000000
Feb 6 09:42:32 xanth vmunix: Memory Port 0 Status Reg = 0000000000000000
Feb 6 09:42:32 xanth vmunix: Memory Port 1 Status Reg = 0000000000000000
Feb 6 09:42:32 xanth vmunix: CIA Error Status = 0000000000000000
Feb 6 09:42:32 xanth vmunix: CIA Error Reg = 0000000000000000
---------------------------------------------------------------------------
Barbara Baker PHONE: (303) 861-6284
The Children's Hospital FAX: (303) 837-2577
Denver, Colorado Email: baker.barb_at_tchden.org
---------------------------------------------------------------------------
Received on Wed Feb 26 1997 - 22:47:12 NZDT