Summary (partial): System panics on 2 different 4.0d Alphas - problem with latest patch set?

From: System Prestidigitator <BOLSON_at_frango.hs.washington.edu>
Date: Thu, 04 Jan 2001 17:24:23 -0800

I don't have a complete answer to the problem noted at the end, but I have more insight.

First, thanks to:
alan_at_nabeth.cxo.dec.com
stan_at_astro.ocis.temple.edu
"Dr. Thomas.Blinn_at_Compaq.com" <tpb_at_doctor.zk3.dec.com>
"Willis, Daniel L." <Dan.Willis_at_celera.com>
Kurt Ludwig <Kurt.Ludwig_at_alpha-processor.com>


An explanation of the HWRPB notice from Dr. Blinn:
the kernel may have trashed the HWRPB or
hardware reset parameter block, an in-memory data structure that is set
up by the SRM console during power up, and should not be written over by
the kernel; if the HWRPB gets trashed, then reboots usually fail (if the
SRM console is working as it should; it's software, too, so it can fail
in many diverse and mysterious ways).

This makes sense.

Also, alan_at_nabeth.cxo.dec.com said:
        Kernel memory faults are usually software bugs. In this
        case, the kernel apparently trying to read a page of
        memory that doesn't exist. Look at the crash dump
        listings in /var/adm/crash and find the stack trace.
        If they run through the same part of the operating
        system, it would appear to be a bad patch.


I sent part of an earlier crash (there was no space for a dump on my AlphaStation) to
Dr. Blinn, as well as the last "top" display on another workstation at that time,
and he noticed that Java was running in each case.

This is true, and we have started making extensive use of Java for actual number-crunching (believe
it or not), which works quite well with the Fast Virtual Machine.
> java -version
java version "1.2.2-4"
Fast VM (build J2SDK.v.1.2.2:11/02/2000-15:42, native threads, jit_122)


We have been using this a lot, and only have had the 2 (actually 3, I had saved one in the middle
of December with the same modus operandi) crashes. But perhaps there is some strange interaction
between Java and the system.

Of course, the REAL answer is to upgrade from 4.0d to 4.0g or 5.1, right?

Thanks,
Ed

Original:
> In the last 3 days, a have had a Personal Workstation 500au and an AlphaStation 500/400 both
crash
> to the >>> prompt with the same error.
> I had recently installed the latest patch kits on each of these, and each had been up since then
> (about a month ago).
> When the first happened, I though maybe there was a hardware glitch. I didn't understand why it
> didn't auto-reboot. There was a message about "invalid HWRPB" in both cases.
>
> This has happened exactly once on each station.
>
> The log shows (for the more recent crash)
> Jan 4 13:31:56 fudge vmunix: trap: invalid memory read access from kernel mode
> Jan 4 13:31:56 fudge vmunix:
> Jan 4 13:31:56 fudge vmunix: faulting virtual address: 0x000000000000003c
> Jan 4 13:31:56 fudge vmunix: pc of faulting instruction: 0xfffffc0000258990
> Jan 4 13:31:56 fudge vmunix: ra contents at time of fault: 0xfffffc0000258744
> Jan 4 13:31:56 fudge vmunix: sp contents at time of fault: 0xffffffff90b16ff0
> Jan 4 13:31:56 fudge vmunix:
> Jan 4 13:31:56 fudge vmunix: panic (cpu 0): kernel memory fault
>
>
> Any explanations?
>
Received on Fri Jan 05 2001 - 01:26:23 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT