Hello again fellow admin,
I just received a very detailed reply from Sung Kang <kangs_at_seas.ucla.edu>
(Thanks Sung!) which may help identify exactly which memory module is
defective.
I have forwarded the email in it's entirety.
---------- Forwarded message ----------
Date: Thu, 29 Oct 1998 12:29:05 -0800
From: kangs_at_seas.ucla.edu
Reply-To: sung.kang.bk.94_at_aya.yale.edu
To: rb237_at_phy.cam.ac.uk
Subject: Re: "too many Processor corrected errors detected on cpu 0" What do it mean?
It looks like you've already got some answers... but here's my 2 cents
since I got the exact same problem and it took me 2 weeks playing phone
tag with 3 different Digital Tech Support folks to resolved it.
> -- cut --
> ******************************** ENTRY 756 ********************************
>
> CPU Minor class 3. Bcache error (630 entry)
>
> 15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
> 0010: FFFFFF00 03DE260F 00000000 00000086 *.........&......*
> 0020: FFFFFFF0 C5FFFFFF 00000000 0000A400 *................*
Line 0020 indicates which type of memory is causing the errors.
>From my notes with my extensive talks with Digital Tech Suppport:
C5FFFFFF indicates that it's a RAM problem. If that set started with an
"8" then it would be a L2 Cache problem. Since it's starting with a
"C", it's a RAM problem. I had the former.
Now from 0010, you can actually figure out exactly which bank the bad
memory chip is in. And, unfortunately, this is where my notes are a bit
bad.... I think it says...
Write down all the numbers of line 0010. Shutdown to console and issue:
">>> show memory" (now I'm not sure if this is an actual command).
You'll then get a list of address... which you then can match up to what
you wrote down from line 0010. Some set of numbers 00-07 indicates
that's the memory module is in the lower quad, slots J1, J3, and J5, and
numbers 08-15 indicates the upper quad, slots J2, J4, and J6.
Well, I think I need to take better notes. :-)
So from your error log it looks definitely like a RAM chip failure.
Figure out exactly which one is the trick now... and for that you should
talk with a knowledgeable Digital Tech. I had to talk to 3 tech's
before I finally got to one who was willing to talk me thro reading the
error entries in the dia logs.
Also, you don't need to wait until you get the console warning message,
dia will log this error long before you get the console warning message.
You just need to find something that uses up enough RAM to hit the bad
memory module. For me it was compiling Emacs 20.
Hope this helps.
- sung
-- end forwarded message --
Regards,
Rich
/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ _ \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\
/_/ Richard A Bemrose /_\ Polymers and Colloids Group \_\
/_/ email: rb237_at_phy.cam.ac.uk /_\ Cavendish Laboratory \_\
/_/ Tel: +44 (0)1223 337 267 /_\ University of Cambridge \_\
/_/ Fax: +44 (0)1223 337 000 /_\ Madingley Road \_\
/_/ (space for rent) / \ Cambridge, CB3 0HE, UK \_\
/_/_/_/_/_/_/
http://www.poco.phy.cam.ac.uk/~rb237 \_\_\_\_\_\_\
"Life is everything and nothing all at once"
-- Billy Corgan, Smashing Pumpkins
Received on Thu Oct 29 1998 - 20:37:45 NZDT