Analyzing hw error on 21164LX

From: Kevin Tyle <kevin_at_meso.com>
Date: Tue, 01 Feb 2000 14:24:14 -0500 (EST)

Hi Managers,

Over the last two weeks, our Digital AlphaPC 164LX 533 MHz machine
has crashed three times. After analyzing the binary error log and
the crash-data files, it looks like the cause is the same in
each case. In fact, the symptoms look virtually identical to a case
posted in the archives on Nov, 14 1997.

In the poster's case, the diagnosis was:

    FINAL CONCLUSION: The most probable cause is over heating of the Bcache Simms.

To put it simply, in each case I see the following--exactly the same as the
referenced case, except for one slight(?) difference:

    SOME INFO FROM THE CRASH DATA FILES: What you may see if you have this problem:
    1) a line in the crash data file that has: EI_STAT reg = fffffff014ffffff

                   (in our case it appears as: EI_STAT reg = fffffff005ffffff)

    2) a panic string of: "Processor Machine Check"
    3) lines in succession with the following:
        Machine Check Processor Fatal Abort
        Machine Check Code = 98
        Machine Check Code = 98
        Processor detected hard error

My question: can I therefore confidently conclude that the problem is the same as what was
diagnosed in the previously posted case? Or might there be other possibilities,
and if so, how might I diagnose it? The overheating problem is possible, though
the overall room environment has not changed in the past year. Perhaps a case fan
might not be working properly, or could the Bcache simms simply be wearing out (the
machine is 2.5 years old and does a lot of memory-intensive number crunching).

Thanks,

Kevin Tyle <kevin_at_meso.com>
MESO, Inc.
Troy, NY USA
Received on Tue Feb 01 2000 - 19:25:15 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:40 NZDT