This is just an update because I still don't have an answer yet. One
response indicated it could be firmware related, I was running 5.8 which
came with my 5.1 software, so it should be O.K., but I upgraded to 6.0
anyway. No luck there either, same errors.
A few other responses said it could be CPU related, so far we've disable
CPU1 but no luck there either.
Most of the responses stated that even though it looks like a hardware
error, it's most likely a software error, probably a kernel bug. I have
saved copies of several kernels on my hard drive so I tried them out. I
first tried 4.0G, and soon saw the same error. But 4.0G was just an interim
update on my way to 5.1, so I decided to try booting off a 4.0F kernel, the
version we were running successfully for several months before the upgrade.
But even on 4.0F I was still seeing the same error.
Next step will probably be to move CPU1 to CPU0 and see if that helps.
(This is kind of embarrassing but our support contract lapsed on that
machine, so we currently don't have support, so we are on our own right
now.)
Jim Fitzmaurice
jpfitz_at_fnal.gov
UNIX is very user friendly, It's just very particular about who it makes
friends with.
----- Original Message -----
> Hello,
>
> I've found lots of entries in the archives, this seems like a common
> problem, but not much in the way of summaries.
>
> I have a 4100 (2-CPU's 1GB Memory), running Tru64 V5.1 PK3. A few
weeks
> after the upgrade to 5.1, several days after updating to PK3, I started
> getting "panic (cpu 0): kernel memory fault" at random intervals. The
> machine won't stays up for more than a day, usually failing within a few
> hours. We replaced the memory, but the same errors continue with no
change.
> Prior to the upgrade we added couple new cards, a second DEGPA Gigabit
> Ethernet card, and a Memory Channel card, but I don't see how they could
> possibly be related to these errors.
>
> To make the problem even more confusing, we have an identical 4100
> (2-CPU's 1GB Memory), running Tru64 V5.1 PK3 and it works just fine! The
> ONLY difference between these two machines is, the failing one has 466MHz
> CPU's and the other has 600MHz CPU's, all other cards and adapters are the
> same.
>
> It's getting a little frustrating, can anybody help? Here's the
> information from "dia".
>
> --------------------------------------------------------------------------
--
> -------
> Logging OS 2. Digital UNIX
> System Architecture 2. Alpha
> Event sequence number 35.
> Timestamp of occurrence 12-JUL-2001 15:30:56
> Host name XXXXX
>
> System type register x00000016 Alpha 4000/1200 Series
> Number of CPUs (mpnum) x00000002
> CPU logging event (mperr) x00000000
>
> Event validity 1. O/S claims event is valid
> Event severity 1. Severe Priority
> Entry type 302. ASCII Panic Message Type
>
> SWI Minor class 9. ASCII Message
> SWI Minor sub class 1. Panic
>
> ASCII Message panic (cpu 0): kernel memory fault
> --------------------------------------------------------------------------
--
> ----------
>
> It look as though we still have a memory problem, but with brand new
> memory...? Maybe it's something else that just looks like a memory
problem,
> but really isn't?
>
> Any help would be appreciated.
>
> Thanks,
>
> Jim Fitzmaurice
> jpfitz_at_fnal.gov
>
> UNIX is very user friendly, It's just very particular about who it makes
> friends with.
>
>
Received on Wed Jul 18 2001 - 14:03:00 NZST