--- 00000001 idle system 0 0 0 0 0 000002d3 memtest memory 1527 0 0 12801015808 12801015808 000002dd memtest memory 1373 0 0 11509170176 11509170176 000002e7 memtest memory 1370 0 0 11484004352 11484004352 000002f1 memtest memory 1373 0 0 11509170176 11509170176 Test CPU resulted in this though EV6 Correctable Dcache ECC Error on CPU 0 EV6 Correctable Memory Fill ECC Error on CPU 0 C_ADDR: 0000000028809E80 C_SYNDROME_1: 0000000000000000 C_SYNDROME_0: 00000000000000D3 Bad CPU? > We just put this ES40 into prod on saturday night and now it has shut itself > down 3 times since then. Does this look like software or hardware? > > > > WARNING: too many Processor corrected errors detected on cpu 0. Reporting > suspended. > WARNING: too many Processor corrected errors detected on cpu 1. Reporting > suspended. > WARNING: too many Processor corrected errors detected on cpu 2. Reporting > suspended. > WARNING: too many Processor corrected errors detected on cpu 3. Reporting > suspended. > Machine Check Processor Fatal Abort > Machine check code = 0x100000098 > Ibox Status = 0000000000000000 > Dcache Status = 000000000000001c > Cbox Address = 000000002112b580 > Fill Syndrome 1 = 0000000000000000 > Fill Syndrome 0 = 00000000000000d3 > Cbox Status = 0000000000000003 > EV6 captured status of Bcache mode = 000000000000000d > EV6 Exception Address = fffffc000066a298 > EV6 Interrupt Enablement and Current Processor mode = > 0000007ee0000000 > EV6 Interrupt Summary Register = 0000000080000000 > EV6 TBmiss or Fault status = 0000000000000290 > EV6 PAL Base Address = 0000000000018000 > EV6 Ibox control = fffffe0007304396 > EV6 Ibox Process_context = 0000748000000004 > O/S Summary flag = 0000000000000004 > Cchip Base Address (phys) = 00000f01a0000000 > Cchip Device Raw Interrupt Request = 0000000000000000 > DRIR Register Decode: > Machine Check SYSTEM Fatal Abort > Machine check code = 0x100000202 > Ibox Status = 0000000000000000 > Dcache Status = 0000000000000000 > Cbox Address = 0000000000000000 > Fill Syndrome 1 = 0000000000000000 > Fill Syndrome 0 = 0000000000000000 > Cbox Status = 0000000000000000 > EV6 captured status of Bcache mode = 0000000000000000 > EV6 Exception Address = fffffc00008cd140 > EV6 Interrupt Enablement and Current Processor mode = > 00000062e0000000 > EV6 Interrupt Summary Register = 0000000200000000 > EV6 TBmiss or Fault status = 0000000000000000 > EV6 PAL Base Address = 0000000000018000 > EV6 Ibox control = fffffe000f304396 > EV6 Ibox Process_context = 0000000000000000 > O/S Summary flag = 0000000000000006 > Cchip Base Address (phys) = 00000f01a0000000 > Cchip Device Raw Interrupt Request = 2000000000000000 > DRIR Register Decode: > Bit 61: Error from Pchip 1 > PCI Device Interrupt Mask = 0000000000000000 > Cchip Miscellaneous Register = 0000000800000030 > Misc Register Decode: > Bit 4: Interval Timer Intr Pending to CPU 0 > Bit 5: Interval Timer Intr Pending to CPU 1 > Bit 35: CChip Rev (Bit<35>) > Cchip Revision: 08 > ID of CPU performing read: 00 > Pchip 0 Base Address (phys) = 00000f0180000000 > Pchip 0 Error Register = 0000000000000000 > Pchip Error Register Decode: > PCI Xaction Start Address = 0000000000000000 > PCI Command: Interrupt Acknowledge > Pchip 1 Base Address (phys) = 00000f0380000000 > Pchip 1 Error Register = d300bd54f6200801 > Pchip Error Register Decode: > Bit 0: Lost Error > Bit 11: Correctable ECC Error > System Address = 00000000bd54f620 > Command: DMA Read > ECC Syndrome: d3 > panic (cpu 0): System Uncorrectable Machine Check > Machine Check SYSTEM Fatal Abort > Machine check code = 0x100000202 > Ibox Status = 0000000000000000 > Dcache Status = 0000000000000000 > Cbox Address = 0000000000000000 > Fill Syndrome 1 = 0000000000000000 > Fill Syndrome 0 = 0000000000000000 > Cbox Status = 0000000000000000 > EV6 captured status of Bcache mode = 0000000000000000 > EV6 Exception Address = fffffc00006ae004 > EV6 Interrupt Enablement and Current Processor mode = > 00000062e0000000 > EV6 Interrupt Summary Register = 0000000200000000 > EV6 TBmiss or Fault status = 0000000000000000 > EV6 PAL Base Address = 0000000000018000 > EV6 Ibox control = fffffe000f304396 > EV6 Ibox Process_context = 0000000000000000 > O/S Summary flag = 0000000000000006 > Cchip Base Address (phys) = 00000f01a0000000 > Cchip Device Raw Interrupt Request = 2000000000000000 > DRIR Register Decode: > Bit 61: Error from Pchip 1 > PCI Device Interrupt Mask = 0000000000000000 > Cchip Miscellaneous Register = 0000000800000ff0 > Misc Register Decode: > Bit 4: Interval Timer Intr Pending to CPU 0 > Bit 5: Interval Timer Intr Pending to CPU 1 > Bit 6: Interval Timer Intr Pending to CPU 2 > Bit 7: Interval Timer Intr Pending to CPU 3 > Bit 8: Interprocessor Intr Pending to CPU 0 > Bit 9: Interprocessor Intr Pending to CPU 1 > Bit 10: Interprocessor Intr Pending to CPU 2 > Bit 11: Interprocessor Intr Pending to CPU 3 > Bit 35: CChip Rev (Bit<35>) > Cchip Revision: 08 > ID of CPU performing read: 00 > Pchip 0 Base Address (phys) = 00000f0180000000 > Pchip 0 Error Register = 0000000000000000 > Pchip Error Register Decode: > PCI Xaction Start Address = 0000000000000000 > PCI Command: Interrupt Acknowledge > Pchip 1 Base Address (phys) = 00000f0380000000 > Pchip 1 Error Register = d300bd54fd200801 > Pchip Error Register Decode: > Bit 0: Lost Error > Bit 11: Correctable ECC Error > System Address = 00000000bd54fd20 > Command: DMA Read > ECC Syndrome: d3 > > DUMP: blocks available: 1983962 > DUMP: blocks wanted: 930642 (partial compressed dump) [OKAY] > DUMP: Device Disk Blocks Available > DUMP: ------ --------------------- > DUMP: 0x1300013 122678 - 1983959 (of 1983960) [primary swap] > DUMP.prom: Open: dev 0x5100001, block 786432: SCSI 1 3 0 3 300 0 0 > DUMP: Writing header... [1024 bytes at dev 0x1300013, block 1983960] > esMP: Writing data..Machine Check Proc > soErV F6 atCoalrr Aecbortt > lMea chDicneac chehe EckCC c Eodrre or= 0 x1on00 C00PU00 198 > > ta Ibox S > tEusV6 C or= re00c0t00ab00le00 M00em00or00y 0 > l Dlca chECe C StEarturos r on = C00PU00 100 > Fi > 000000001Cc_ > cD DCR:bo x A dd re ss 00 00= > 00000000000000000740e8057 > 80 > FiCll_S SYNynDRdrOomMEe _11 : = > 00000000000000000000000000000000 > > Fill SCyn_SdrYNomDRe OM0 E_ 0: = 0 > 00000000000000000000000000d30 > Cb > D > usox Stat > EV =6 0Co00r00r0e00c0t00ab00l0e03 > ac EcVh6 e caECptC urEedr rostr atonus C oPUf B3c D > = he mode > 0E00V600 C00or00re00ct00a0b00le > MEVe6m Eorxcy epFitillon EAdCCdr Eesrsr o r= > ffofnff Cc0PU00 306 > abf8c > Pr CE_V6AD IDRnt:e rr up t En ab0l0em0en00t 0a0nd00 C00ur0r7en48t > 0 > ocessor Cmo_deS =YN 0DR00OM00E0_621:e0 0 00 000000 > u 00EV006 00In00te00rr 00 > pt SummaCry_S RYeNgiDRstOMerE_ 0=: 0 00 000000000080000000000000 > 0 EVD6 3TB 00 > auss or F > Elt Vst6 atCousrr e=c 0t0a00bl00e 00Dc00ac00h0e28 E0 > C EVE6 rPArLo Bra seo nAd CdrPUes 2s C > 0 = 000000 > 00EV0061 80Co00rr > ec tEVa6 blIbe oxMe cmoonrytr oFl il l = ECffCf > ffEre0ro00r f3on04 C39PU6 > > 2 > EV6 Ibox CPr_ocADesDRs_:c on te xt = > 0000000000000000000000000074008 > > 0 > O/S SummCar_yS fYNlaDRg OM E_= 10:00 0 00 0000000000000000004 > Ba C0ch00ip0 00 > se AddreCs_sSY (NDphROysME) _ 0= :00 0 00 0f0010a0000000000000 > D C0ch0Dip3 00 > evice Raw Interrupt Request = 0000000000000000 > : DRIR Register Decode > > E V P6C I CoDerrvicee ctInabtelerr uDptc aMchase k E=C 0C > 00Er00ro00r 00o0n00 C00PU00 2 > C > e chip Misc > llEVan6 eoCousrr Recegtiastbelr e =M e00mo00ry00 F00il000l00 > E00C0 > D E r r Moris oc n ReCgPisU te2r > C > ecode: > C _ CADchDRip: R ev i si on : 000000 > r 0 I00D 00of1 CCPC0U C0pe > forming Cre_SadY:N 0DR0 > ) EP_1ch: ip 00 0Ba00se0 0Ad00dr0e0s0s 0(0ph00ys0 > = 00000C_f0SY18ND00RO00ME00_00 > r Pc0h0ip00 000 E00rr00or00 R00egDi3ste > = 0000000000000000 > Pchip Error Register Decode: > PCI Xaction Start Address = 0000000000000000 > PCI Command: Interrupt Acknowledge > Pchip 1 Base Address (phys) = 00000f0380000000 > 00 Pchip 1 Error Register = 000000 > E00V600 C00or00re > c t a b lPceh ipDc Earcroher EReCCgi Estrerr orDe ocon deCP: > 3 U > ioI Xact > En V6St Carort reAdctdrabeslse = M 0em00o0r00y 00F0i00ll00 E00CC0 > E rPCroI r Coomnma CndPU: I3nt > errupt ACck_AnoDDwlR:ed ge > > D UM P:0 0fi00rs00t 0c0ra00sh00 d76um8p0 f > 00led: atCt_emSYptNDinROg MmEem_1or: y du00mp00..00. > 00000000 > C_SYNDROME_0: 00000000000000D3 > > EV6 Correctable Dcache ECC Error on CPU 2 > > EV6 CorDrUMeP:ct caobmplere Msseminorg y9 30Fi64ll0K BE iCCnt Eo r76ro30r > 73on5K CB PUme 2mo > ry... > CDU_AMPDD: R S: ta r ti ng A d dr00es00s 00 00 00 E00nd7in4g80 A > Edress C S_SizYNe(DRMBOM) > D1:UMP : --00--00--00--00--00--00--0-0--00- > -------C--_S--YN--DR--O-M--E_ -0:-- - -- 0--00 > D00UM00P:0 00x00ff00ffD3fc > 00081f1c0 > o - E0xV6ff Cffofrc0re03ctffabfflfeef D 8ca94c.h0 e (iECndC icEratroorr ) > D UCMPP:U 0 3xf > f5ffc01f > cE00V600 C- o0rxfreffctffabc0le1f Mffeem3foerf y10 F.1il (li ndECicaCto > Er)rr > owc om0n: LCPinU k 3d > n > C_ADDR: 00000000000070C0 > C_SYNDROME_1: 0000000000000000 > C_SYNDROME_0 > > >Received on Mon Aug 23 2004 - 16:50:37 NZST
This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:44 NZDT