UPDATE: Crash/It is happening again

From: Cyndi Smith <cyn_at_odin.mdacc.tmc.edu>
Date: Thu, 16 Mar 2000 14:44:06 -0600 (CST)

It is me again.

After applying the latest patch kit for 4.0F onto our 4100 5/400
(PK3, just uploaded to the dec patches site on Friday...), I had
hoped that our problems were over. To refresh your memory, we were
having intermittant "buffering" of keystrokes -- they seem to
occur in all windows - with all logins - at the same time(s).
After a few seconds, everything a user had typed in a given window,
showed up correctly in that window... This started last Wednesday
morning (March 8) and led to a system crash that same afternoon.
dia and uerf reported
      panic (cpu 2): simple_lock: time limit exceeded

Despite my best efforts and the wonderful help (I learned a lot!) of
this group, I was unable to figure out exactly what happened, but the
behavior improved (fewer instances of the "buffering") and then the
new patch kit came out -- with several patches mentioning simple_lock...

With High Hopes, I installed the kit on Saturday. Unfortunately, the
behavior is still there -- not very often, but still.

Today, I noticed the following in the dia output and wondered if it means
one of our CPUs is going bad:

**** V3.0 ********************** ENTRY 323 ********************************


Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 657.
Timestamp of occurrence 15-MAR-2000 21:23:52
Host name odin

System type register x00000016 Alpha 4000/1200 Series
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000

Event validity 1. O/S claims event is valid
Event severity 3. High Priority
Entry type 100. Machine Check Error - (major class)
                                  4. - (minor class)


Software Flags x0000000000000000
Active CPUs x0000000F
Hardware Rev x00000000
System Serial Number BT00000000
Module Serial Number
Module Type x0000
System Revision x00000000

Machine Check Reason x0086 Alpha Chip Detected ECC Err, From B-Cache

Ext Interface Status Reg xFFFFFFF081FFFFFF
                                     DATA SOURCE IS BCACHE
                                     CORRECTABLE ECC ERROR
                                     D-ref fill
Ext Interface Address Reg xFFFFFF00C44F4EFF
Fill Syndrome Reg x0000000000002900
Interrupt Summary Reg x0000000100000000
                                     Correctable ECC Errors (IPL31)
                                     AST Requests 3-0: x0000000000000000
                                       
WHOAMI x00000000 CPU0 Detected This Error
                                       
--IOD REGISTERS FOLLOW--
Base Addr of Bridge x0000000000000000
                                     Register Contents Not Valid For This Error
Dev Type & Rev Register x00000000 Register Contents Not Valid For This Error
MC Error Info Register 0 x00000000 Register Contents Not Valid For This Error
MC Error Info Register 1 x00000000 Register Contents Not Valid For This Error
CAP Error Register x00000000 Register Contents Not Valid For This Error
MDPA Status Register x00000000 MDPA Status Register Data Not Valid
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not Valid
MDPB Status Register x00000000 MDPB Status Register Data Not Valid
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not Valid
                                       
PALcode Revision Palcode Rev: 1.23-3


**** V3.0 ********************** ENTRY 324 ********************************

Should I place a repair call NOW? <grin>

Thanks for your advice.
Cyndi
--
-Cyndi Smith			     Programmer Analyst III, Biomathematics
-cyn_at_odin.mdacc.tmc.edu		M.D. Anderson Cancer Center, Houston, Texas
-phone: (713) 794-4938					fax: (713) 792-4262
			<http://odin.mdacc.tmc.edu/~cyn>
Received on Thu Mar 16 2000 - 20:45:05 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:40 NZDT