We just put this ES40 into prod on saturday night and now it has shut itself
down 3 times since then. Does this look like software or hardware?
WARNING: too many Processor corrected errors detected on cpu 0. Reporting
suspended.
WARNING: too many Processor corrected errors detected on cpu 1. Reporting
suspended.
WARNING: too many Processor corrected errors detected on cpu 2. Reporting
suspended.
WARNING: too many Processor corrected errors detected on cpu 3. Reporting
suspended.
Machine Check Processor Fatal Abort
Machine check code = 0x100000098
Ibox Status = 0000000000000000
Dcache Status = 000000000000001c
Cbox Address = 000000002112b580
Fill Syndrome 1 = 0000000000000000
Fill Syndrome 0 = 00000000000000d3
Cbox Status = 0000000000000003
EV6 captured status of Bcache mode = 000000000000000d
EV6 Exception Address = fffffc000066a298
EV6 Interrupt Enablement and Current Processor mode =
0000007ee0000000
EV6 Interrupt Summary Register = 0000000080000000
EV6 TBmiss or Fault status = 0000000000000290
EV6 PAL Base Address = 0000000000018000
EV6 Ibox control = fffffe0007304396
EV6 Ibox Process_context = 0000748000000004
O/S Summary flag = 0000000000000004
Cchip Base Address (phys) = 00000f01a0000000
Cchip Device Raw Interrupt Request = 0000000000000000
DRIR Register Decode:
Machine Check SYSTEM Fatal Abort
Machine check code = 0x100000202
Ibox Status = 0000000000000000
Dcache Status = 0000000000000000
Cbox Address = 0000000000000000
Fill Syndrome 1 = 0000000000000000
Fill Syndrome 0 = 0000000000000000
Cbox Status = 0000000000000000
EV6 captured status of Bcache mode = 0000000000000000
EV6 Exception Address = fffffc00008cd140
EV6 Interrupt Enablement and Current Processor mode =
00000062e0000000
EV6 Interrupt Summary Register = 0000000200000000
EV6 TBmiss or Fault status = 0000000000000000
EV6 PAL Base Address = 0000000000018000
EV6 Ibox control = fffffe000f304396
EV6 Ibox Process_context = 0000000000000000
O/S Summary flag = 0000000000000006
Cchip Base Address (phys) = 00000f01a0000000
Cchip Device Raw Interrupt Request = 2000000000000000
DRIR Register Decode:
Bit 61: Error from Pchip 1
PCI Device Interrupt Mask = 0000000000000000
Cchip Miscellaneous Register = 0000000800000030
Misc Register Decode:
Bit 4: Interval Timer Intr Pending to CPU 0
Bit 5: Interval Timer Intr Pending to CPU 1
Bit 35: CChip Rev (Bit<35>)
Cchip Revision: 08
ID of CPU performing read: 00
Pchip 0 Base Address (phys) = 00000f0180000000
Pchip 0 Error Register = 0000000000000000
Pchip Error Register Decode:
PCI Xaction Start Address = 0000000000000000
PCI Command: Interrupt Acknowledge
Pchip 1 Base Address (phys) = 00000f0380000000
Pchip 1 Error Register = d300bd54f6200801
Pchip Error Register Decode:
Bit 0: Lost Error
Bit 11: Correctable ECC Error
System Address = 00000000bd54f620
Command: DMA Read
ECC Syndrome: d3
panic (cpu 0): System Uncorrectable Machine Check
Machine Check SYSTEM Fatal Abort
Machine check code = 0x100000202
Ibox Status = 0000000000000000
Dcache Status = 0000000000000000
Cbox Address = 0000000000000000
Fill Syndrome 1 = 0000000000000000
Fill Syndrome 0 = 0000000000000000
Cbox Status = 0000000000000000
EV6 captured status of Bcache mode = 0000000000000000
EV6 Exception Address = fffffc00006ae004
EV6 Interrupt Enablement and Current Processor mode =
00000062e0000000
EV6 Interrupt Summary Register = 0000000200000000
EV6 TBmiss or Fault status = 0000000000000000
EV6 PAL Base Address = 0000000000018000
EV6 Ibox control = fffffe000f304396
EV6 Ibox Process_context = 0000000000000000
O/S Summary flag = 0000000000000006
Cchip Base Address (phys) = 00000f01a0000000
Cchip Device Raw Interrupt Request = 2000000000000000
DRIR Register Decode:
Bit 61: Error from Pchip 1
PCI Device Interrupt Mask = 0000000000000000
Cchip Miscellaneous Register = 0000000800000ff0
Misc Register Decode:
Bit 4: Interval Timer Intr Pending to CPU 0
Bit 5: Interval Timer Intr Pending to CPU 1
Bit 6: Interval Timer Intr Pending to CPU 2
Bit 7: Interval Timer Intr Pending to CPU 3
Bit 8: Interprocessor Intr Pending to CPU 0
Bit 9: Interprocessor Intr Pending to CPU 1
Bit 10: Interprocessor Intr Pending to CPU 2
Bit 11: Interprocessor Intr Pending to CPU 3
Bit 35: CChip Rev (Bit<35>)
Cchip Revision: 08
ID of CPU performing read: 00
Pchip 0 Base Address (phys) = 00000f0180000000
Pchip 0 Error Register = 0000000000000000
Pchip Error Register Decode:
PCI Xaction Start Address = 0000000000000000
PCI Command: Interrupt Acknowledge
Pchip 1 Base Address (phys) = 00000f0380000000
Pchip 1 Error Register = d300bd54fd200801
Pchip Error Register Decode:
Bit 0: Lost Error
Bit 11: Correctable ECC Error
System Address = 00000000bd54fd20
Command: DMA Read
ECC Syndrome: d3
DUMP: blocks available: 1983962
DUMP: blocks wanted: 930642 (partial compressed dump) [OKAY]
DUMP: Device Disk Blocks Available
DUMP: ------ ---------------------
DUMP: 0x1300013 122678 - 1983959 (of 1983960) [primary swap]
DUMP.prom: Open: dev 0x5100001, block 786432: SCSI 1 3 0 3 300 0 0
DUMP: Writing header... [1024 bytes at dev 0x1300013, block 1983960]
esMP: Writing data..Machine Check Proc
soErV F6 atCoalrr Aecbortt
lMea chDicneac chehe EckCC c Eodrre or= 0 x1on00 C00PU00 198
ta Ibox S
tEusV6 C or= re00c0t00ab00le00 M00em00or00y 0
l Dlca chECe C StEarturos r on = C00PU00 100
Fi
000000001Cc_
cD DCR:bo x A dd re ss 00 00=
00000000000000000740e8057
80
FiCll_S SYNynDRdrOomMEe _11 : =
00000000000000000000000000000000
Fill SCyn_SdrYNomDRe OM0 E_ 0: = 0
00000000000000000000000000d30
Cb
D
usox Stat
EV =6 0Co00r00r0e00c0t00ab00l0e03
ac EcVh6 e caECptC urEedr rostr atonus C oPUf B3c D
= he mode
0E00V600 C00or00re00ct00a0b00le
MEVe6m Eorxcy epFitillon EAdCCdr Eesrsr o r=
ffofnff Cc0PU00 306
abf8c
Pr CE_V6AD IDRnt:e rr up t En ab0l0em0en00t 0a0nd00 C00ur0r7en48t
0
ocessor Cmo_deS =YN 0DR00OM00E0_621:e0 0 00 000000
u 00EV006 00In00te00rr 00
pt SummaCry_S RYeNgiDRstOMerE_ 0=: 0 00 000000000080000000000000
0 EVD6 3TB 00
auss or F
Elt Vst6 atCousrr e=c 0t0a00bl00e 00Dc00ac00h0e28 E0
C EVE6 rPArLo Bra seo nAd CdrPUes 2s C
0 = 000000
00EV0061 80Co00rr
ec tEVa6 blIbe oxMe cmoonrytr oFl il l = ECffCf
ffEre0ro00r f3on04 C39PU6
2
EV6 Ibox CPr_ocADesDRs_:c on te xt =
0000000000000000000000000074008
0
O/S SummCar_yS fYNlaDRg OM E_= 10:00 0 00 0000000000000000004
Ba C0ch00ip0 00
se AddreCs_sSY (NDphROysME) _ 0= :00 0 00 0f0010a0000000000000
D C0ch0Dip3 00
evice Raw Interrupt Request = 0000000000000000
: DRIR Register Decode
E V P6C I CoDerrvicee ctInabtelerr uDptc aMchase k E=C 0C
00Er00ro00r 00o0n00 C00PU00 2
C
e chip Misc
llEVan6 eoCousrr Recegtiastbelr e =M e00mo00ry00 F00il000l00
E00C0
D E r r Moris oc n ReCgPisU te2r
C
ecode:
C _ CADchDRip: R ev i si on : 000000
r 0 I00D 00of1 CCPC0U C0pe
forming Cre_SadY:N 0DR0
) EP_1ch: ip 00 0Ba00se0 0Ad00dr0e0s0s 0(0ph00ys0
= 00000C_f0SY18ND00RO00ME00_00
r Pc0h0ip00 000 E00rr00or00 R00egDi3ste
= 0000000000000000
Pchip Error Register Decode:
PCI Xaction Start Address = 0000000000000000
PCI Command: Interrupt Acknowledge
Pchip 1 Base Address (phys) = 00000f0380000000
00 Pchip 1 Error Register = 000000
E00V600 C00or00re
c t a b lPceh ipDc Earcroher EReCCgi Estrerr orDe ocon deCP:
3 U
ioI Xact
En V6St Carort reAdctdrabeslse = M 0em00o0r00y 00F0i00ll00 E00CC0
E rPCroI r Coomnma CndPU: I3nt
errupt ACck_AnoDDwlR:ed ge
D UM P:0 0fi00rs00t 0c0ra00sh00 d76um8p0 f
00led: atCt_emSYptNDinROg MmEem_1or: y du00mp00..00.
00000000
C_SYNDROME_0: 00000000000000D3
EV6 Correctable Dcache ECC Error on CPU 2
EV6 CorDrUMeP:ct caobmplere Msseminorg y9 30Fi64ll0K BE iCCnt Eo r76ro30r
73on5K CB PUme 2mo
ry...
CDU_AMPDD: R S: ta r ti ng A d dr00es00s 00 00 00 E00nd7in4g80 A
Edress C S_SizYNe(DRMBOM)
D1:UMP : --00--00--00--00--00--00--0-0--00-
-------C--_S--YN--DR--O-M--E_ -0:-- - -- 0--00
D00UM00P:0 00x00ff00ffD3fc
00081f1c0
o - E0xV6ff Cffofrc0re03ctffabfflfeef D 8ca94c.h0 e (iECndC icEratroorr )
D UCMPP:U 0 3xf
f5ffc01f
cE00V600 C- o0rxfreffctffabc0le1f Mffeem3foerf y10 F.1il (li ndECicaCto
Er)rr
owc om0n: LCPinU k 3d
n
C_ADDR: 00000000000070C0
C_SYNDROME_1: 0000000000000000
C_SYNDROME_0
Received on Mon Aug 23 2004 - 16:17:34 NZST