SUMMARY What do these CPU exceptions mean?

From: Scott Brewster <scott_at_sessb.its.dias.qut.edu.au>
Date: Fri, 17 Sep 1999 11:46:58 +0000

Hi,

Thanks to Christine Weber, Pedro Cunha, Peter Reynolds, and
Richard Eisenman.

Basically, the question was what do "Processor Correctable
Error (630)" entries in the binary.errlog mean? They have caused
the machine to crash at least once.

The general feeling among the people who replied is that it is
either faulty main memory or a faulty motherboard. As the
motherboard has been replaced by Compaq, it looks like it could
be the main memory.

Fortunately, Compaq is going to replace the memory next.

The original question is attached below.

Scott

> Hi,
>
> We have a machine reporting CPU exceptions (a list of recent exceptions is
> attached at the end of the message). What do these exceptions mean?
> As far as I can tell, they have caused the machine to crash at least once.
>
> After the crash, Compaq replaced the motherboard unit (which has everything
> except the main memory and PCI cards), however the CPU exceptions persist.
> Is the main memory faulty?
>
> Type of machine: Digital Personal WorkStation 600au
> OS: Digital Unix 4.0D PK3
> Firmware revision: 7.0-10
> Memory: 4 * 256Mb (total 1Gb)
> Disks: 3 channel SWXCR RAID controller + internal SCSI bus
>
> Scott
>
> --------
>
> Near the time of the crash the exceptions were happening more often,
> perhaps ten or twenty that day, but now they are occuring once every few
> days.
>
> dia reports the exceptions like this: (the values in the Entry body field
> change sometimes)
>
> ******************************** ENTRY 1 ********************************
>
>
> Logging OS 2. Digital UNIX
> System Architecture 2. Alpha
> Event sequence number 2.
> Timestamp of occurrence 16-SEP-1999 05:17:26
> Host name bluejay
>
> System type register x0000001E Systype 30. (Miata)
> Number of CPUs (mpnum) x00000001
> CPU logging event (mperr) x00000000
>
> Event validity 1. O/S claims event is valid
> Event severity 1. Severe Priority
> Entry type 100. CPU Machine Check Errors
>
> CPU Minor class 3. Processor Correctable Error (630)
>
> Entry Body Size: x00000068
> Entry body:
>
> 15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
> 0000: 00000038 00000018 80000000 00000068 *h...........8...*
> 0010: FFFFFF00 33C8CF4F 00000000 00000086 *........O..3....*
> 0020: FFFFFFF0 C5FFFFFF 00000000 00001A00 *................*
> 0030: 00000000 00000000 00000001 00000000 *................*
> 0040: 00000000 00000000 00000000 00000000 *................*
> 0050: 00000000 00000000 00000000 00000000 *................*
> 0060: 5E3C7E25 00000000 * ....%~<^*
>
>
> At the time of the crash:
>
> ******************************** ENTRY 27 ********************************
>
>
> Logging OS 2. Digital UNIX
> System Architecture 2. Alpha
> Event sequence number 14.
> Timestamp of occurrence 02-SEP-1999 18:33:33
> Host name bluejay
>
> System type register x0000001E Systype 30. (Miata)
> Number of CPUs (mpnum) x00000001
> CPU logging event (mperr) x00000000
>
> Event validity 1. O/S claims event is valid
> Event severity 1. Severe Priority
> Entry type 302. ASCII Panic Message Type
>
> SWI Minor class 9. ASCII Message
> SWI Minor sub class 1. Panic
>
> ASCII Message panic (cpu 0): Processor Machine Check
>
>
> ******************************** ENTRY 28 ********************************
>
>
> Logging OS 2. Digital UNIX
> System Architecture 2. Alpha
> Event sequence number 13.
> Timestamp of occurrence 02-SEP-1999 18:33:33
> Host name bluejay
>
> System type register x0000001E Systype 30. (Miata)
> Number of CPUs (mpnum) x00000001
> CPU logging event (mperr) x00000000
>
> Event validity 1. O/S claims event is valid
> Event severity 1. Severe Priority
> Entry type 100. CPU Machine Check Errors
>
> CPU Minor class 1. Processor Uncorrectable Error (670)
>
> Entry Body Size: x00000208
> Entry body:
>
> 15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
> 0000: 000001A0 00000118 00000000 000002C0 *................*
> 0010: 00000000 00000000 00000000 00000098 *................*
> 0020: 00000000 00000000 00000000 00000000 *................*
> 0030: 00000000 00000000 00000000 00000000 *................*
> 0040: 00000000 00000000 00000000 00000000 *................*
> 0050: FFFFFFFF A85C4000 00000000 00000000 *........._at_\.....*
> 0060: FFFFFC00 003FA9D0 00000000 000002B8 *..........?.....*
> 0070: 00000000 00000400 00000000 00005200 *.R..............*
> 0080: 00000000 00000000 FFFFFFFF A85C7838 *8x\.............*
> 0090: 1F1E1615 14020100 FFFFFC00 003FA2F0 *..?.............*
> 00A0: FFFFFC00 003F9818 FFFFFC00 003FA710 *..?.......?.....*
> 00B0: FFFFFC00 003FA940 FFFFFC00 003FA570 *p.?....._at_.?.....*
> 00C0: 00000000 00F00270 FFFFFFFF FFF8DA00 *........p.......*
> 00D0: 00000098 06700009 00000000 00F0380C *.8........p.....*
> 00E0: 00000000 11FFD980 00000000 00000000 *................*
> 00F0: 00000000 39018000 FFFFFFFF A85C75D0 *.u\........9....*
> 0100: FFFFFC00 00561FE0 FFFFFC00 003FA970 *p.?.......V.....*
> 0110: FFFFFC00 003F9818 00000000 05C3BA38 *8.........?.....*
> 0120: 00000000 00000000 00000000 00000000 *................*
> 0130: 00000000 00000000 00000000 00018000 *................*
> 0140: 00000000 00000000 00000041 62020000 *...bA...........*
> 0150: FFFFFFFF FF8000A0 00000000 00000000 *................*
> 0160: FFFFFF00 0001D04F 00000000 00014890 *.H......O.......*
> 0170: FFFFFF80 2D8D6FFF 00000000 00000000 *.........o.-....*
> 0180: 00000000 00000C00 FFFFFF00 1961227F *."a.............*
> 0190: FFFFFF00 1961227F FFFFFFF9 45FFFFFF *...E....."a.....*
> 01A0: 00000000 00400000 00000000 00000000 *.........._at_.....*
> 01B0: 00000000 00000000 00000000 00000000 *................*
> 01C0: 00000000 020C0000 00000000 00000B93 *................*
> 01D0: 00000000 58910000 00000000 0001D540 *_at_..........X....*
> 01E0: 00000000 00008240 00000000 02010002 *........_at_.......*
> 01F0: 00000000 00008240 00000000 00000000 *........_at_.......*
> 0200: 5E3C7E25 00000000 * ....%~<^*
>
>
> uerf reports much less information, typically something like:
>
> ********************************* ENTRY 1. *********************************
>
> ----- EVENT INFORMATION -----
>
> EVENT CLASS ERROR EVENT
> OS EVENT TYPE 100. CPU EXCEPTION
> SEQUENCE NUMBER 2.
> OPERATING SYSTEM DEC OSF/1
> OCCURRED/LOGGED ON Thu Sep 16 05:17:26 1999
> OCCURRED ON SYSTEM bluejay
> SYSTEM ID x0007001E
> SYSTYPE x00000000
>
> ----- UNIT INFORMATION -----
>
> UNIT CLASS CPU
>
Received on Fri Sep 17 1999 - 02:03:53 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT