SUMMARY: cpu panic on PWS600au

From: Dirk Hufnagel <hufnagel_at_mps.ohio-state.edu>
Date: Wed, 27 Feb 2002 23:52:14 -0500

Only got one responce from Joe Fletcher telling me to check the
seating of the CPU and memory and consider replacing those parts
if reseating doesn't help. Certainly good advice and I am actually
alreading runnign tests with different memory modules in different
memory slots.

So basically this seem to be a hardware problem and I will
have to locate the broken parts and replace them.

        Dirk

Original message:
>
> Yesterday a PWS600au I manage paniced and went to the SRM prompt.
> I had gotten what I thought to be memory errors from that machine
> for a while now and was in the process of figuring out what
> memory modules to replace. But it never paniced before and now
> I am not sure anymore if these really are memory errors or
> something worse. I attach the corresponding output from the
> binary errorlog and /var/adm/messages. I would appreciate
> any help deciphering them.
>
> BTW, I got a few hundred binary errorlog entries like 1387
> within the last few weeks but the machine never paniced.
> They usually happened under high load which made me suspect
> the memory.
>
> Thanks
>
> Dirk Hufnagel
>
>
> **** V3.3 ********************* ENTRY 1387
> ********************************
>
>
> Logging OS 2. Digital UNIX
> System Architecture 2. Alpha
> Event sequence number 317.
> Timestamp of occurrence 25-FEB-2002 17:09:39
> Host name hostna
>
> System type register x0000001E Systype 30. (Miata)
> Number of CPUs (mpnum) x00000001
> CPU logging event (mperr) x00000000
>
> Event validity 1. O/S claims event is valid
> Event severity 1. Severe Priority
> Entry type 100. Machine Check Error - (major class)
> 1. - (minor class)
>
>
>
> ========================
> Raw Event Data Dump
> ========================
>
> Entry# (record in file) 1387.
>
> Entry Body Size: x00000240
> Entry body:
>
> 15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
> 0000: 3C7AB623 00060101 0007001E 013D0240 *_at_.=.........#.z<*
> 0010: 00000006 00000000 00003266 6C616C63 *hostna..........*
> 0020: 00000000 1A010064 00000000 00000001 *........d.......*
> 0030: 00000000 000002C0 00000000 00000000 *................*
> 0040: 00000000 0000020F 000001A0 00000118 *................*
> 0050: 00000000 00000000 00000000 00000000 *................*
> 0060: 00000000 00000000 00000000 00000000 *................*
> 0070: 00000000 00000000 00000000 00000000 *................*
> 0080: 00000000 00000000 00000000 00000000 *................*
> 0090: 00000000 00000000 00000000 F38A427F *.B..............*
> 00A0: 00000000 00005200 FFFFFC00 004C8A50 *P.L......R......*
> 00B0: 00000000 00000000 00000000 00000257 *W...............*
> 00C0: FFFFFC00 004C8310 00000001 00000016 *..........L.....*
> 00D0: FFFFFC00 004C8790 1F1E1615 14020100 *..........L.....*
> 00E0: FFFFFC00 004C8600 FFFFFC00 004CC1D0 *..L.......L.....*
> 00F0: FFFFFFFF FFF8C800 FFFFFC00 004C89C0 *..L.............*
> 0100: 00000000 00F0380C 00000000 00F00270 *p........8......*
> 0110: 00000000 00000000 0000020F 06600001 *..`.............*
> 0120: FFFFFFFF A3E6FA38 00000001 1FFFF090 *........8.......*
> 0130: FFFFFC00 004C89F0 00000000 0B804000 *._at_........L.....*
> 0140: 00000000 0D53FA38 FFFFFC00 006A7570 *puj.....8.S.....*
> 0150: 00000000 00000000 FFFFFC00 004CC1D0 *..L.............*
> 0160: 00000000 00018000 00000000 00000000 *................*
> 0170: 00000041 62020000 00000000 80000000 *...........bA...*
> 0180: 00000000 00000000 00000000 00000000 *................*
> 0190: 00000000 000140D0 00000001 423E4C1C *.L>B....._at_......*
> 01A0: 00000000 00000000 FFFFFF00 0001CD4F *O...............*
> 01B0: FFFFFFFF F8F7FEFF FFFFFFFF F7FFEFFF *................*
> 01C0: FFFFFFF0 05FFFFFF 00000000 00009F9F *................*
> 01D0: 00000000 00000000 FFFFFF00 1CAB795F *_y..............*
> 01E0: FFFFFFFF 80000080 00000000 00000000 *................*
> 01F0: 00000000 00000B93 00000000 00000010 *................*
> 0200: 00000000 0EE28FC0 00000000 0000F3F3 *................*
> 0210: 00000000 07060000 00000000 58000000 *...X............*
> 0220: 00000000 00000000 00000000 0000E002 *................*
> 0230: 003C7E25 00000000 FFFFFFFF 80140000 *............%~<^*
>
>
>
> **** V3.3 ********************* ENTRY 1388
> ********************************
>
>
> Logging OS 2. Digital UNIX
> System Architecture 2. Alpha
> Event sequence number 318.
> Timestamp of occurrence 25-FEB-2002 17:09:39
> Host name clalf2
>
> System type register x0000001E Systype 30. (Miata)
> Number of CPUs (mpnum) x00000001
> CPU logging event (mperr) x00000000
>
> Event validity 1. O/S claims event is valid
> Event severity 1. Severe Priority
> Entry type 302. ASCII Panic Message Type
> -1. - (minor class)
>
> SWI Minor class 9. ASCII Message
> SWI Minor sub class 1. Panic
>
> ASCII Message panic (cpu 0): System Uncorrectable
> Machine Check
>
>
>
>
> Feb 25 17:22:05 hostna vmunix: Machine Check SYSTEM Fatal Abort
> Feb 25 17:22:05 hostna vmunix: Machine Check Code = 20f
> Feb 25 17:22:05 hostna vmunix: PCI master abort error
> Feb 25 17:22:05 hostna vmunix: pal temp[0-1] =
> 00000000f38a427f 0000000000000000
> Feb 25 17:22:05 hostna vmunix: pal temp[2-3] =
> fffffc00004c8a50 0000000000005200
> Feb 25 17:22:05 hostna vmunix: pal temp[4-5] =
> 0000000000000257 0000000000000000
> Feb 25 17:22:06 hostna vmunix: pal temp[6-7] =
> 0000000100000016 fffffc00004c8310
> Feb 25 17:22:06 hostna vmunix: pal temp[8-9] =
> 1f1e161514020100 fffffc00004c8790
> Feb 25 17:22:06 hostna vmunix: pal temp[10-11] =
> fffffc00004cc1d0 fffffc00004c8600
> Feb 25 17:22:06 hostna vmunix: pal temp[12-13] =
> fffffc00004c89c0 fffffffffff8c800
> Feb 25 17:22:06 hostna vmunix: pal temp[14-15] =
> 0000000000f00270 0000000000f0380c
> Feb 25 17:22:06 hostna vmunix: pal temp[16-17] =
> 0000020f06600001 0000000000000000
> Feb 25 17:22:06 hostna vmunix: pal temp[18-19] =
> 000000011ffff090 ffffffffa3e6fa38
> Feb 25 17:22:06 hostna vmunix: pal temp[20-21] =
> 000000000b804000 fffffc00004c89f0
> Feb 25 17:22:06 hostna vmunix: pal temp[22-23] =
> fffffc00006a7570 000000000d53fa38
> Feb 25 17:22:06 hostna vmunix: shadow[0-1] = 0000000000000000
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix: shadow[2-3] = 0000000000000000
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix: shadow[4-5] = 0000000000000000
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix: shadow[6-7] = 0000000000000000
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix: Address of excepting instruction =
> fffffc00004cc1d0
> Feb 25 17:22:06 hostna vmunix: Summary of arithmetic traps =
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix: Exception mask =
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix: Base address for PALcode =
> 0000000000018000
> Feb 25 17:22:06 hostna vmunix: Interrupt Status Reg =
> 0000000080000000
> Feb 25 17:22:07 hostna vmunix: CURRENT SETUP OF EV5 IBOX =
> 0000004162020000
> Feb 25 17:22:07 hostna vmunix: I-CACHE Reg Tag parity error =
> 0000000000000000
> Feb 25 17:22:07 hostna vmunix: D-CACHE error Reg =
> 0000000000000000
> Feb 25 17:22:07 hostna vmunix: Effective VA = 00000001423e4c1c
> Feb 25 17:22:07 hostna vmunix: reason for D-stream =
> 00000000000140d0
> Feb 25 17:22:07 hostna vmunix: EV5 Secondary Cache address =
> ffffff000001cd4f
> Feb 25 17:22:07 hostna vmunix: EV5 Secondary Cache TAG/Data
> parity = 0000000000000000
> Feb 25 17:22:07 hostna vmunix: EV5 BC_TAG_ADDR =
> fffffffff7ffefff
> Feb 25 17:22:07 hostna vmunix: EV5 EI_STAT_ADDR Phys addr of Xfer
> = fffffffff8f7feff
> Feb 25 17:22:07 hostna vmunix: Fill Syndrome = 0000000000009f9f
> Feb 25 17:22:07 hostna vmunix: EI_STAT reg = fffffff005ffffff
> Feb 25 17:22:07 hostna vmunix: LD_LOCK = ffffff001cab795f
> Feb 25 17:22:07 hostna vmunix: PYXIS_DMA_DATA = 0000000000000000
> Feb 25 17:22:07 hostna vmunix: CIA/PYXIS ERR =
> ffffffff80000080
> Feb 25 17:22:07 hostna vmunix: PCI BUS Master state machine
> generated Master Abort
> Feb 25 17:22:07 hostna vmunix: CIA/PYXIS ERR STAT =
> 0000000000000010
> Feb 25 17:22:07 hostna vmunix: CIA/PYXIS ERR MASK =
> 0000000000000b93
> Feb 25 17:22:08 hostna vmunix: CIA/PYXIS ECC_SYN =
> 000000000000f3f3
> Feb 25 17:22:08 hostna vmunix: CIA/PYXIS MEM ERR0 =
> 000000000ee28fc0
> Feb 25 17:22:08 hostna vmunix: CIA/PYXIS MEM ERR1 =
> 0000000058000000
> Feb 25 17:22:08 hostna vmunix: CIA/PYXIS PCI ERR0 =
> 0000000007060000
> Feb 25 17:22:08 hostna vmunix: CIA/PYXIS PCI ERR1 =
> 000000000000e002
> Feb 25 17:22:08 hostna vmunix: ISA bridge NMI status & control =
> 0000000000000000
> Feb 25 17:22:08 hostna vmunix: CIA/PYXIS PCI ERR2 =
> ffffffff80140000
> Feb 25 17:22:08 hostna vmunix: panic (cpu 0): System Uncorrectable
> Machine Check
>
Received on Thu Feb 28 2002 - 13:21:41 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:43 NZDT