SUMMARY: "too many Processor corrected errors detected on cpu 0" What does it mean?

From: Richard Bemrose <rb237_at_phy.cam.ac.uk>
Date: Thu, 29 Oct 1998 19:18:57 +0000 (GMT)

Hello fellow admin,

I must first thank all the following people for their quick replies
(5 replies under 30 minutes):
        "Raul Sossa S." <RSossa_at_datadec.co.cr>
        Serguei Patchkovskii <patchkov_at_ucalgary.ca>
        Rodrigo Poblete <rpoblete_at_gmd.com.pe>
        "Dr. Tom Blinn, 603-884-0646" <tpb_at_doctor.zk3.dec.com>
        Debra Alpert <alpert_at_fas.harvard.edu>

In my original poster I asked what the following warning message means:
-- cut --
WARNING: too many Processor corrected errors detected on cpu 0. Reporting suspended.
-- cut --

The overwhelming consensus was that the warning is due to a defective or
incorrectly seated memory modules. I have therefore reseated the memory
and the L2 cache modules. I will monitor the workstation over the next
couple of days to see if this has resolved the problem. Otherwise, I will
call out a DEC engineer to identify the defective module.

In addition, Tom Blinn offered a detailed explanation:
> It's a warning. Your system has ECC memory. When the processor detects
> an error in the memory (e.g., during a Bcache fill from memory), and
> finds an error, and it's a correctable error, it corrects the error. It
> reports the correction to the kernel. When there are MANY such
> corrections, the kernel disables logging additional error frames in the
> error log (since some memory can be a bit flaky and have repeated
> errors, and the log would fill up your /var partition if every error
> were logged).
>
> If you want to fix this, you need to get detailed analysis of the error
> frame data (e.g., by a trained service person) to identify the marginal
> memory component (there are no doubt multiple memory SIMMs in the
> system, most likely only one is bad) and then replace the memory. If
> you do this the problem MIGHT go away, if the new memory is more
> reliable than the old memory.

Thank you all once again.

Regards,
Rich

 /_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ _ \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\
/_/ Richard A Bemrose /_\ Polymers and Colloids Group \_\
/_/ email: rb237_at_phy.cam.ac.uk /_\ Cavendish Laboratory \_\
/_/ Tel: +44 (0)1223 337 267 /_\ University of Cambridge \_\
/_/ Fax: +44 (0)1223 337 000 /_\ Madingley Road \_\
/_/ (space for rent) / \ Cambridge, CB3 0HE, UK \_\
 /_/_/_/_/_/_/ http://www.poco.phy.cam.ac.uk/~rb237 \_\_\_\_\_\_\
             "Life is everything and nothing all at once"
              -- Billy Corgan, Smashing Pumpkins
Received on Thu Oct 29 1998 - 19:19:51 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT