All,
I have a AlphaServer 2100 with three 5/250 CPUs and three 128 MB memory
boards. The machine is not under service with Compaq, but I was hoping there
is a hardware guru out there. The server is experiencing CPU machine check
errors. I have included one entry from DECEvent that shows the error. I am
trying to decifer the exact H/W problem so that we can get the correct part
on order. The funny part about this problem is that it is not causing the
server to panic and crash, so there are no crash dumps. The problem has been
registered by different CPUs, so I am wondering if it is not a main memory
error.
Any help is appreciated.
Kevin Partin
******************************** ENTRY 2 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 5.
Timestamp of occurrence 03-MAY-1999 06:23:48
Host name ewsa8
System type register x00000009 AlphaServer 2x00
Number of CPUs (mpnum) x00000003
CPU logging event (mperr) x00000002
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 2. System Uncorrectable Error (660)
-- ENTRY FRAME FOLLOWS --
Frame ID x00000022 Machine Check Frame
CPU Number Logging Event 2.
- ALPHA EV5 COMMON REGS -
Flags: x00000002
Machine Check Error Code x00000202 CPU Detected Unrecoverable Error
PAL SHADOW REG 0 x0000000000000000
PAL SHADOW REG 1 x0000000000000000
PAL SHADOW REG 2 x0000000000000000
PAL SHADOW REG 3 x0000000000000000
PAL SHADOW REG 4 x0000000000000000
PAL SHADOW REG 5 x0000000000000000
PAL SHADOW REG 6 x0000000000000000
PAL SHADOW REG 7 x0000000000000000
PALTEMP0 x0000000000000000
PALTEMP1 x0000000000000000
PALTEMP2 xFFFFFC00004B3980
PALTEMP3 x0000000000004E00
PALTEMP4 x0000000000000000
PALTEMP5 x0000000000000001
PALTEMP6 x000000000000002A
PALTEMP7 xFFFFFC00004B32A0
PALTEMP8 x1F1E161514020100
PALTEMP9 xFFFFFC00004B36C0
PALTEMP10 xFFFFFC00002BA954
PALTEMP11 xFFFFFC00004B3520
PALTEMP12 xFFFFFC00004B38F0
PALTEMP13 x0000012000000120
PALTEMP14 x0000000000000001
PALTEMP15 x0000000000000000
PALTEMP16 x0000020306600201
PALTEMP17 x0000000000000000
PALTEMP18 x0000000000000000
PALTEMP19 xFFFFFFFF979DB9B8
PALTEMP20 x0000000000998000
PALTEMP21 xFFFFFC00004B3920
PALTEMP22 xFFFFFC0000665BC0
PALTEMP23 x00000000156C7A38
Exception Address Reg xFFFFFC00002BA954
Native-mode Instruction
Exception PC x3FFFFF00000AEA55
Exception Summary Reg x0000000000000000
Exception Mask Reg x0000000000000000
PAL Base Address Reg x0000000000014000
Base Addr for PALcode:
x0000000000000005
Interrupt Summary Reg x0000000000000000
AST Requests 3-0: x0000000000000000
IBOX Ctrl and Status Reg x0000004160800000
Timeout Counter Bit Clear.
IBOX Timeout Counter Enabled.
Floating Point Instructions will Cause
FEN Exceptions.
PAL Shadow Registers Enabled.
Correctable Error Interrupts Enabled.
ICACHE BIST (Self Test) Was Successful.
Icache Par Err Stat Reg x0000000000000000
Dcache Par Err Stat Reg x0000000000000000
Virtual Address Reg xFFFFFFFF97B6FD18
Memory Mgmt Flt Sts Reg x0000000000016191
If Error, Reference Which Caused Was
Write
If Err, Reference Resulted in DTB Miss
Fault Inst RA Field: x0000000000000006
Fault Inst Opcode: x000000000000002C
Scache Address Reg xFFFFFF0011C2116F
Scache Status Reg x0000000000000000
Bcache Tag Address Reg xFFFFFF80154F6FFF
Last Bcache Access Resulted in a Miss.
Value of Parity Bit for Tag Control
Status
Bits Dirty, Shared & Valid is Set.
Value of Tag Control Dirty Bit is Set.
Value of Tag Control Shared Bit is
Clear.
Value of Tag Control Valid Bit is Set.
Value of Parity Bit Covering Tag Store
Address Bits is Set.
Tag Address<38:20> Is:
x0000000000000154
Ext Interface Address Reg xFFFFFF000000000F
Fill Syndrome Reg x0000000000003FFF
Ext Interface Status Reg xFFFFFFF004FFFFFF
Error Occurred During D-ref Fill
LD LOCK xFFFFFF0000639A0F
- SYSTEM SPECIFIC REGS -
Configuration Reg (R0) x380000F238000002
LOW LONGWORD Slice Follows
RATTLER Gate Array: Revision #2
Bit 12 Clr: Cmd/Data NOACK are Errors
Bit 24 Clr: IDLEBC Assert in Last Cycle
4
Bit 25 Clr: IDLEBC Assert During Cycle
4
Bit 27 Set: ACK Set_Dirty & Set_Lock
Cmds
CACHE Size Field: 4 MB Cache
HIGH LONGWORD Slice Follows
RATTLER Gate Array: Revision #2
Bit 36 Set: Rx IPL31 on CBus CERR
Assert
Bit 37 Set: Rx HALT on CBus SYS_EVENT
Bit 38 Set: Rx HALT on IIRR CSR24 HALT
Req
Bit 39 Set: Rx INTERPROC INT on Write
to
IIRR CSR24 INTERPROC INT
Req
Bit 44 Clr: Cmd/Data NOACK are Errors
Bit 56 Clr: IDLEBC Assert in Last Cycle
4
Bit 57 Clr: IDLEBC Assert During Cycle
4
Bit 59 Set: ACK Set_Dirty & Set_Lock
Cmds
CACHE Size Field: 4 MB Cache
Error Summary Reg (R1) x0000000000000000
EVB Control Register (R2) x0000006100000061
LOW LONGWORD Slice Follows
Bit 0 Set: Enable Addr-Cmd Parity
Checking
Bit 5 Set: Enable Bcache ECC Corr
QW0/QW2
Bit 6 Set: Enable ECC Check - QW0/QW2
Data
HIGH LONGWORD Slice Follows
Bit 32 Set: Enable Addr-Cmd Parity
Check
Bit 37 Set: Enable Bcache ECC Corr
QW1/QW3
Bit 38 Set: Enable ECC Check-QW1/QW3
Data
Victim Error Addr (R3) x010C0006010C0006
LOW LONGWORD Slice Follows
EVB<33:4> Victim Addr
x00000000010C0006
HIGH LONGWORD Slice Follows
EVB<33:4> Victim Addr
x00000000010C0006
Correctable Err Reg (R4) x0000000000000000
LOW LONGWORD Slice Follows
QW0 ECC Syndrome: No Syndrome Bits Set
QW2 ECC Syndrome: No Syndrome Bits Set
HIGH LONGWORD Slice Follows
QW1 ECC Syndrome: No Syndrome Bits Set
QW3 ECC Syndrome: No Syndrome Bits Set
Correctable Err Addr (R5) xB820000AB820000A
LOW LONGWORD Slice Follows
Bit 32 Set: EV-Bus Bit 39, IO Bit, Set
EVB<34:4> Corr Err Adr
x000000005820000A
HIGH LONGWORD Slice Follows
Bit 63 Set: EV-Bus Bit 39, IO Bit, Set
EVB<34:4> Corr Err Adr
x000000005820000A
Uncorrectable Error (R6) x8000000080000000
LOW LONGWORD Slice Follows
EVB<3:0> CMD: Command Field = x8
QW0 Uncorr ECC Syndrome
x0000000000000000
QW2 Uncorr ECC Syndrome
x0000000000000000
HIGH LONGWORD Slice Follows
EVB<3:0> CMD: Command Field = x8
QW1 Uncorr ECC Syndrome
x0000000000000000
QW3 Uncorr ECC Syndrome
x0000000000000000
Uncorrectable Err Adr(R7) xB820000EB820000E
LOW LONGWORD Slice Follows
Bit 32 Set: EV-Bus Bit 39, IO Bit, Set
EVB<34:4> Uncor Err Adr
x000000005820000E
HIGH LONGWORD Slice Follows
Bit 63 Set: EV-Bus Bit 39, IO Bit, Set
EVB<34:4> Uncor Err Adr
x000000005820000E
EVB Reserve Register (R8) x0000000000000000
Duplicate Tag Control(R9) x0000011100000111
LOW LONGWORD Slice Follows
Bit 0 Set: Duplicate Tag Enable
Bit 4 Set: Enable Tag Ctrl Parity
Checking
Bit 8 Set: Enable Tag Parity Checking
HIGH LONGWORD Slice Follows
Bit 32 Set: Duplicate Tag Enable
Bit 36 Set: Enable Tag Ctl Parity
Checking
Bit 40 Set: Enable Tag Parity Checking
Duplicate Tag Error (R10) x00000000000000A0
LOW LONGWORD Slice Follows
Dup Tag Store Err Adr
x0000000000000005
Dup Tag Test Control(R11) x0000000000000000
LOW LONGWORD Slice Follows
Bit 3 Clr: Write Good Control Store
Parity
Bit 31 Clr: Write Good Tag Store Parity
Duplicate Tag Address
x0000000000000000
MUX'ed Tag/Addr Field
x0000000000000000
Partial Tag Field x0000000000000000
Duplicate Tag Test (R12) x8000000E8000000E
LOW LONGWORD Slice Follows
Bit 1 Set: Duplicate Tag Shared Bit
Bit 2 Set: Duplicate Tag Valid Bit
Bit 3 Set: TAG Control Parity Bit
Bit 31 Set: Dup Tag RAM, TAG Parity Bit
Dup Tag RAM, TAG Data
x0000000000000000
HIGH LONGWORD Slice Follows
Bit 33 Set: Duplicate Tag Shared Bit
Bit 34 Set: Duplicate Tag Valid Bit
Bit 35 Set: TAG Control Parity Bit
Bit 63 Set: Dup Tag RAM, TAG Parity Bit
Dup Tag RAM, TAG Data
x0000000000000000
Dup Tag Reserve Reg (R13) x0000000000000000
I-Bus Control Stat (R14) x0000100000001000
LOW LONGWORD Slice Follows
Bit 12 Set: Enable I-Bus Parity Check
HIGH LONGWORD Slice Follows
Bit 44 Set: Enable I-Bus Parity Check
I-Bus Error Addr Reg(R15) xFF44030300001993
LOW LONGWORD Slice Follows
C-Bus<31:0> C/A Data x0000000000001993
HIGH LONGWORD Slice Follows
C-Bus<63:32> C/A Data
x00000000FF440303
Arbitration Ctrl Reg(R16) x0000012000000120
LOW LONGWORD Slice Follows
Bit 5 Set: C-Bus2 DONATE Mode Enabled
Bit 8 Set: C-Bus2 PAWN Mode Enabled
HIGH LONGWORD Slice Follows
Bit 37 Set: C-Bus2 DONATE Mode Enabled
Bit 40 Set: C-Bus2 PAWN Mode Enabled
C-Bus2 Control Reg (R17) x0000130100001401
LOW LONGWORD Slice Follows
Bit 0 Set: C-Bus2 Parity Checking
Enabled
Bit 12 Set: Enable C-Bus2 Error
Interrupt
HIGH LONGWORD Slice Follows
Bit 32 Set: C-Bus2 Parity Checking
Enabled
CPU Cmdr ID Field: C-Bus2 CPU #2 ID
Bit 44 Set: Enable C-Bus2 Error
Interrupt
C-Bus2 Error Reg (R18) x0000000000000000
C-Bus2 Err Addr Low (R19) xFF4403030000199B
LOW LONGWORD Slice Follows
CBus CAD<31:0> Er Adr
x000000000000199B
HIGH LONGWORD Slice Follows
CBus CAD<95:64> Er Adr
x00000000FF440303
C-Bus2 Err Addr High(R20) x0F400303E0400073
LOW LONGWORD Slice Follows
CBus CAD<63:32> Er Adr
x00000000E0400073
HIGH LONGWORD Slice Follows
CBus CAD<127:96> Er Adr
x000000000F400303
C-Bus2 Reserve Reg (R21) x0000000000000000
Address Lock Reg (R22) x00639A0000639A00
LOW LONGWORD Slice Follows
EV<30:5> Lock Address
x0000000000031CD0
HIGH LONGWORD Slice Follows
EV<30:5> Lock Address
x0000000000031CD0
Proc Mailbox Reg (R23) x0000000000000000
Inter-Proc Int Req (R24) x0000000000000000
System Int Clear Reg(R25) x0000000000000000
Perf Monitor Ctl Reg(R26) x0000000000000000
Perf Monitor Reg 1 (R27) x0000000000000000
Perf Monitor Reg 2 (R28) x0000000000000000
Perf Monitor Reg 3 (R29) x0000000000000000
Perf Monitor Reg 4 (R30) x0000000000000000
Perf Monitor Reg 5 (R31) x0000000000000000
-- ENTRY FRAME FOLLOWS --
Frame ID x00000011 T2 System-Bus to PCI Bridge Frame
IO Control/Status Reg xFE00000323020580
Bit 7 Set: TLB Error Checking Enabled
Bit 8 Set: CBUS CXACK Check Enabled
Bit 10 Set: EV5 Exclusive Exchange
Enabled
Bit 24 Set: NOACK, CUCERR, OutOfSync
Enbld
Bit 25 Set: PCI Memory Space Enabled
Bit 29 Set: CBUS Parity Checking
Enabled
Bit 32 Set: CBUS Back-to-Back Cycles
Enbld
T2 Revision: Pass 2
State Machine Vis Select: CBUS Cyc
Counter
Bit 57 Set: PCI NMI Interrupts Enabled
Bit 58 Set: PCI Dev Timeout Inter
Enabled
Bit 59 Set: PCI SERR# Interrupts
Enabled
Bit 60 Set: PCI PERR# Interrupts
Enabled
Bit 61 Set: PCI Rd Data Prty Inter
Enabled
Bit 62 Set: PCI Adr Parity Inter
Enabled
Bit 63 Set: PCI Wrt Data Prty Inter
Enbled
CERR1 CBUS Error Reg 1 x0000000000000000
CERR2 Failed C/A <63:00> xE3800010E3800010
CERR3 Failed C/A <127:64> x006010C3406010C3
PERR1 PCI Error Reg 1 x0000000000000000
PERR2 PCI Cmd & Err Addr x000000065665FE40
Failed Cmd & Addr Valid When Parity
Error
Failed PCI Cmd: x6 Memory Read
PCI Error Address: x000000005665FE40
HAE0_1 High Adr Ext Reg 1 x0000000000000010
HAE0_1 <4:0> is Sparse Mem PCI_AD
<31:27>
HAE0_2 High Adr Ext Reg 2 x0000000000000000
HBASE PC Hole Base Reg x000000000010603F
PC Hole End Addr: x000000000000003F
Bit 13 Set: PC Hole Enable 1
Bit 14 Set: PC Hole Enable 2
PC Hole Start Addr: x0000000000000020
WBASE1 Window Base Reg 1 x00000000400807FF
PCI Window End Adr: x00000000000007FF
Bit 19 Set: PCI Window Enable
PCI Window Start Adr:
x0000000000000400
WMASK1 Window Mask Reg 1 x000000003FF00000
PCI Window Mask: x00000000000003FF
TBASE1 Translated Base R1 x0000000000000000
Translated Base Addr:
x0000000000000000
WBASE2 Window Base Reg 2 x00000000000C03FF
PCI Window End Adr: x00000000000003FF
Bit 18 Set: Scatter-Gather Enable
Bit 19 Set: PCI Window Enable
PCI Window Start Adr:
x0000000000000000
WMASK2 Window Mask Reg 2 x000000003FF00000
PCI Window Mask: x00000000000003FF
TBASE2 Translated Base R2 x0000000000800000
Translated Base Addr:
x0000000000004000
TDR0 TLB Data Register 0 x0000000000000000
TDR0 Data is Invalid
TLB Entry 0 Tag Data x0000000000000000
TLB Entry 0 PFN Data x0000000000000000
TDR1 TLB Data Register 1 x0000000000000000
TDR1 Data is Invalid
TLB Entry 1 Tag Data x0000000000000000
TLB Entry 1 PFN Data x0000000000000000
TDR2 TLB Data Register 2 x0000000000000000
TDR2 Data is Invalid
TLB Entry 2 Tag Data x0000000000000000
TLB Entry 2 PFN Data x0000000000000000
TDR3 TLB Data Register 3 x0000000000000000
TDR3 Data is Invalid
TLB Entry 3 Tag Data x0000000000000000
TLB Entry 3 PFN Data x0000000000000000
TDR4 TLB Data Register 4 x0000000000000000
TDR4 Data is Invalid
TLB Entry 4 Tag Data x0000000000000000
TLB Entry 4 PFN Data x0000000000000000
TDR5 TLB Data Register 5 x0000000000000000
TDR5 Data is Invalid
TLB Entry 5 Tag Data x0000000000000000
TLB Entry 5 PFN Data x0000000000000000
TDR6 TLB Data Register 6 x0000000000000000
TDR6 Data is Invalid
TLB Entry 6 Tag Data x0000000000000000
TLB Entry 6 PFN Data x0000000000000000
TDR7 TLB Data Register 7 x0000000000000000
TDR7 Data is Invalid
TLB Entry 7 Tag Data x0000000000000000
TLB Entry 7 PFN Data x0000000000000000
-- ENTRY FRAME FOLLOWS --
Frame ID x00000008 Memory Frame
Memory Module ID x00000000
Error Register 1 x0000000000040001
[Even] Error Summary
[Even] EDC Corr Error
Command Trap Register 1 xE20000080281AA80
Command Trap Register 2 x0060008FF0480946
Configuration Register x8005505080055050
EDC Status Register 1 x03760B3F07520E2A
[Even] Read CBITS <11:0>
x0000000000000E2A
[Even] Write CBITS <11:0>
x0000000000000752
[Odd] Read CBITS <11:0>
x0000000000000B3F
[Odd] Write CBITS <11:0>
x0000000000000376
EDC Status Register 2 x00000E890000041F
[Even] Syndrome <11:0>
x000000000000041F
[Odd] Syndrome <11:0>
x0000000000000E89
EDC Control Register x2000000020000000
[Even] Substitute Read Cbits Used
[Even] Substitute Write Cbits Used
[Even] Disable Inbound Parity Check
[Even] Enable EDC swap Mode
[Even] Complement Read Data Parity
[Even] Disable EDC Correction
[Even] Disable EDC Reporting
[Odd] Substitute Read Cbits Used
[Odd] Substitute Write Cbits Used
[Odd] Disable Inbound Parity Check
[Odd] Enable EDC swap Mode
[Odd] Complement Read Data Parity
[Odd] Disable EDC Correction
[Odd] Disable EDC Reporting
[Even] Subs. Read CBITS <
x0000000000000000
[Even] Subs. Write CBITS
x0000000000000000
[Odd] Subs. Read CBITS <1
x0000000000000000
[Odd] Subs. Write CBITS <
x0000000000000000
Stream Buffer Control Reg x0000080000000800
Refresh Control Register x000001D8000001D8
[Even] Refresh Enable
[Odd] Refresh Enable
[Even] Syndrome Mask <11:
x00000000000000D8
[Odd] Syndrome Mask <11:0
x00000000000000D8
Filter Control Register x0000000000000000
[Even] Syndrome Mask <11:
x0000000000000000
[Even] Bank Select x0000000000000000
[Odd] Syndrome Mask <11:0
x0000000000000000
[Odd] Bank Select x0000000000000000
-- ENTRY FRAME FOLLOWS --
Frame ID x00000008 Memory Frame
Memory Module ID x00000001
Error Register 1 x0000000000000000
Command Trap Register 1 xE2400008E2400008
Command Trap Register 2 x0060008F4060008F
Configuration Register x8015505180155051
EDC Status Register 1 x0D6101F5085D080B
[Even] Read CBITS <11:0>
x000000000000080B
[Even] Write CBITS <11:0>
x000000000000085D
[Odd] Read CBITS <11:0>
x00000000000001F5
[Odd] Write CBITS <11:0>
x0000000000000D61
EDC Status Register 2 x0000001700000671
[Even] Syndrome <11:0>
x0000000000000671
[Odd] Syndrome <11:0>
x0000000000000017
EDC Control Register x2000000020000000
[Even] Substitute Read Cbits Used
[Even] Substitute Write Cbits Used
[Even] Disable Inbound Parity Check
[Even] Enable EDC swap Mode
[Even] Complement Read Data Parity
[Even] Disable EDC Correction
[Even] Disable EDC Reporting
[Odd] Substitute Read Cbits Used
[Odd] Substitute Write Cbits Used
[Odd] Disable Inbound Parity Check
[Odd] Enable EDC swap Mode
[Odd] Complement Read Data Parity
[Odd] Disable EDC Correction
[Odd] Disable EDC Reporting
[Even] Subs. Read CBITS <
x0000000000000000
[Even] Subs. Write CBITS
x0000000000000000
[Odd] Subs. Read CBITS <1
x0000000000000000
[Odd] Subs. Write CBITS <
x0000000000000000
Stream Buffer Control Reg x0000080000000800
Refresh Control Register x000001D8000001D8
[Even] Refresh Enable
[Odd] Refresh Enable
[Even] Syndrome Mask <11:
x00000000000000D8
[Odd] Syndrome Mask <11:0
x00000000000000D8
Filter Control Register x0000000000000000
[Even] Syndrome Mask <11:
x0000000000000000
[Even] Bank Select x0000000000000000
[Odd] Syndrome Mask <11:0
x0000000000000000
[Odd] Bank Select x0000000000000000
-- ENTRY FRAME FOLLOWS --
Frame ID x00000008 Memory Frame
Memory Module ID x00000002
Error Register 1 x0000000000000000
Command Trap Register 1 xE2800008E2800008
Command Trap Register 2 x0060008F4060008F
Configuration Register x8401505284015052
EDC Status Register 1 x005F0145066E0F70
[Even] Read CBITS <11:0>
x0000000000000F70
[Even] Write CBITS <11:0>
x000000000000066E
[Odd] Read CBITS <11:0>
x0000000000000145
[Odd] Write CBITS <11:0>
x000000000000005F
EDC Status Register 2 x000000170000000D
[Even] Syndrome <11:0>
x000000000000000D
[Odd] Syndrome <11:0>
x0000000000000017
EDC Control Register x2000000020000000
[Even] Substitute Read Cbits Used
[Even] Substitute Write Cbits Used
[Even] Disable Inbound Parity Check
[Even] Enable EDC swap Mode
[Even] Complement Read Data Parity
[Even] Disable EDC Correction
[Even] Disable EDC Reporting
[Odd] Substitute Read Cbits Used
[Odd] Substitute Write Cbits Used
[Odd] Disable Inbound Parity Check
[Odd] Enable EDC swap Mode
[Odd] Complement Read Data Parity
[Odd] Disable EDC Correction
[Odd] Disable EDC Reporting
[Even] Subs. Read CBITS <
x0000000000000000
[Even] Subs. Write CBITS
x0000000000000000
[Odd] Subs. Read CBITS <1
x0000000000000000
[Odd] Subs. Write CBITS <
x0000000000000000
Stream Buffer Control Reg x0000080000000800
Refresh Control Register x000001D8000001D8
[Even] Refresh Enable
[Odd] Refresh Enable
[Even] Syndrome Mask <11:
x00000000000000D8
[Odd] Syndrome Mask <11:0
x00000000000000D8
Filter Control Register x0000000000000000
[Even] Syndrome Mask <11:
x0000000000000000
[Even] Bank Select x0000000000000000
[Odd] Syndrome Mask <11:0
x0000000000000000
[Odd] Bank Select x0000000000000000
-- ENTRY FRAME FOLLOWS --
Frame ID x00000008 Memory Frame
Memory Module ID x00000003
NULL Memory Frame. The registers in
this
frame contain zeros
-- ENTRY FRAME FOLLOWS --
Frame ID x00000000 End Frame
------------------------------------------
Kevin S. Partin
The Boeing Company
13100 Space Center Blvd.
Mail Code: JHOU-2230
Houston, TX 77059
Phone: 281-244-4088
Pager: 713-549-0713
Facsimile: 281-244-4984
Email: mailto:kevin.s.partin_at_boeing.com
Received on Mon May 10 1999 - 17:02:54 NZST