In April, I installed Patch Kit #4 of Tru64 V4.0E on an AlphaStation
500/500. Two weeks after that, I started to receive Machine
Check Errors constantly, although they are apparently correctable and
don't seem to effect the system in any adverse way.
Still, it is very strange and I wonder if it is related to one of
the patches that I installed. Can anybody offer some insight into
this problem and how to solve it, e.g. removing a particular patch,
replacing hardware.
I should add that I have installed this same patch kit on at least
half a dozen different types of hardware without this problem
appearing.
In /var/adm/messages I get:
Apr 18 19:14:08 noah vmunix: WARNING: too many Processor corrected
errors detected on cpu 0. Reporting suspended.
Apr 18 19:28:21 noah vmunix: WARNING: too many Processor corrected
errors detected on cpu 0. Reporting suspended.
Apr 18 19:43:01 noah vmunix: WARNING: too many Processor corrected
errors detected on cpu 0. Reporting suspended.
Apr 18 19:51:39 noah last message repeated 2 times
ad infinitum:
Oct 24 09:27:53 noah vmunix: WARNING: too many Processor corrected
errors detected on cpu 0. Reporting suspended.
Oct 24 09:31:45 noah vmunix: WARNING: too many Processor corrected
errors detected on cpu 0. Reporting suspended.
Oct 24 09:36:47 noah vmunix: WARNING: too many Processor corrected
errors detected on cpu 0. Reporting suspended.
Oct 24 09:41:44 noah vmunix: WARNING: too many Processor corrected
errors detected on cpu 0. Reporting suspended.
Oct 24 09:52:23 noah last message repeated 2 times
Oct 24 10:06:49 noah last message repeated 3 times
Oct 24 10:16:44 noah last message repeated 2 times
Oct 24 10:26:44 noah last message repeated 2 times
Oct 24 10:36:58 noah last message repeated 2 times
And the output of dia -R gives (for example):
DECevent V3.1
**** V3.1 ********************** ENTRY 1 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 52324.
Timestamp of occurrence 24-OCT-2000 10:47:05
Host name noah
System type register x0000000F AlphaStation 600 or 500
Number of CPUs (mpnum) x00000001
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. Machine Check Error - (major class)
4. - (minor class)
Flags: x80000000 Retryable Error
Mchk Error Code x0000000000000086
EV5 Detected Corr ECC Error
EI ADDR xFFFFFF000192C3FF
FILL SYNDROME x000000000000009B
EI STATUS xFFFFFFF081FFFFFF
Error occurred during D-ref fill
ISR x0000000100000000
Correctable ECC errors (IPL31)
AST requests 3 - 0 x0000000000000000
CIA Syndrome x0000000000000000
ECC Syndrome x0000000000000000
MEM ERR0 x0000000000000000
Memory Port Address x0000000000000000
MEM ERR1 x0000000000000000
Bits <33:32> of Memory Po x0000000000000000
Bit <39> of Memory Port x0000000000000000
Memory Command x0000000000000000
Mask When Err Occurred x0000000000000000
Mem Seq State Idle
Encoded Set Sel: Set 0 Selected
CIA ERR STAT x0000000000000000
Memory Cycle Source is PCI
IO Cmnd/Addr Queue Vld Bi x0000000000000000
CPU Cmnd/Addr Queue Vld B x0000000000000000
DM State: Idle
EV5 Resp. for DMA: No Response
CIA ERR x0000000000000000
**** V3.1 ********************** ENTRY 2 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 52323.
Timestamp of occurrence 24-OCT-2000 10:47:04
Host name noah
System type register x0000000F AlphaStation 600 or 500
Number of CPUs (mpnum) x00000001
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. Machine Check Error - (major class)
4. - (minor class)
A
Flags: x80000000 Retryable Error
Mchk Error Code x0000000000000086
EV5 Detected Corr ECC Error
EI ADDR xFFFFFF000192C3FF
FILL SYNDROME x000000000000009B
EI STATUS xFFFFFFF081FFFFFF
Error occurred during D-ref fill
ISR x0000000100000000
Correctable ECC errors (IPL31)
AST requests 3 - 0 x0000000000000000
CIA Syndrome x0000000000000000
ECC Syndrome x0000000000000000
MEM ERR0 x0000000000000000
Memory Port Address x0000000000000000
MEM ERR1 x0000000000000000
Bits <33:32> of Memory Po x0000000000000000
Bit <39> of Memory Port x0000000000000000
Memory Command x0000000000000000
Mask When Err Occurred x0000000000000000
Mem Seq State Idle
Encoded Set Sel: Set 0 Selected
CIA ERR STAT x0000000000000000
Memory Cycle Source is PCI
IO Cmnd/Addr Queue Vld Bi x0000000000000000
CPU Cmnd/Addr Queue Vld B x0000000000000000
DM State: Idle
EV5 Resp. for DMA: No Response
CIA ERR x0000000000000000
etc.
TIA,
Peter
Peter Stern
Chemical Physics Department
Weizmann Institute of Science
76100 Rehovot, ISRAEL
email: Peter.Stern_at_weizmann.ac.il
phone: 972-8-9342096
fax: 972-8-9344123
Received on Tue Oct 24 2000 - 08:58:07 NZDT