Hi folks,
I administrate a Compaq Alpha Server ES40 with four EV67 CPUs 666Mhz
with
8 GByte RAM with the following memory organisation:
0 2048Mb 0000000000000000 4-Way
1 2048Mb 0000000080000000 4-Way
2 2048Mb 0000000100000000 4-Way
3 2048Mb 0000000180000000 4-Way
After one year of running a few weeks ago the following messages appear
in the syslog:
Jul 22 16:12:15 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 2. Reporting suspended.
Jul 22 16:13:27 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 1. Reporting suspended.
Jul 22 16:14:18 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 3. Reporting suspended.
and a few hours later the machine goes to the console prompt.
During the memory test the following messages appear:
EV6 Correctable Memory Fill ECC Error on CPU 0
C_ADDR: 00000000A8FC5BC0
C_SYNDROME_1: 0000000000000057
C_SYNDROME_0: 0000000000000000
EV6 Correctable Dcache ECC Error on CPU 0
EV6 Correctable Memory Fill ECC Error on CPU 0
C_ADDR: 00000000A8FD2BC0
C_SYNDROME_1: 0000000000000057
C_SYNDROME_0: 0000000000000000
First, I thought, it's an defect DRAM module, located in bank 1 because
of the
C_ADDR information. But after removing bank the error still occurs.
So, my question, it is a memory or CPU problem, and, if it's a memoery
problem,
how can I determine the defect DRAM Chip? I haven't found any suitable
documentation.
Many thanks & Bye,
Christian
--
v
..d8b.. Dipl.inform. Christian Becker
..:::d888b:::..
:::::d88888b::::: Institut fuer Angewandte Mathematik & Numerik, LS3
:::::d8888888b::::: Universitaet Dortmund
::::d888888888b:::: Vogelpothsweg 87, 44227 Dortmund, Germany
::{8888P"::"V8,:: Voicemail: +49 231 755 5934 FAX: +49 231 755 5933
:D8P":::::::VD: mailto:Christian.Becker_at_mathematik.uni-dortmund.de
dP ``````` Y
Received on Mon Aug 05 2002 - 14:39:23 NZST