SUMMARY: Hardware Errors

From: <marco_at_gore.afep.cornell.edu>
Date: Tue, 29 Oct 96 09:22:59 -0500

I posted the following a while back but the responses didn't seem to
help. Thankfully, a week ago the "ping bug" was found and I believe that
I have found the problem. The errors resulting from "ping + W95" were
the just like the past couple of months. Therefore, I believe that the
        panic (cpu 0): Machine check - _Hardware error
statement in my UERF output results from the ping bug. Digital thought
this to be a cache memory error. BTW, I did try to reseat my memory and
blow out more dust but to no avail.

Original Post:

Hello Managers,
        I am having trouble with my DEC 3000 500X. It crashes
occasionally complaining about Hardware Check Errors. I called DEC about
this and they said that it was the cache memory and that the whole system
board (motherboard) would have to be replaced. They couldn't tell which
level of cache that it was. They also said that the processor or the L2
cache chips could not be simply replaced because they couldn't be removed
(they are in sockets however). I can't justify spending $5000 for an
infrequent problem which they diagnosed by deciphering the registers in
my uerf ouput. On their recommendation I blew out all the dust from
inside the box. That didn't seem to help. Now the problem is more
frequent and I need help before I commit to the $5000.



Some interesting responses:


From: Alan Cox <coxa_at_cableol.net>

Sounds like overheating maybe if it got worse. If you are lucky its just
a bad connection. We had a problem after a machine was moved which the
Digital Engineer fixed by removing and reseating all the cards carefully
and checking all the chips were firmly in their sockets. Fortunately in
our case the machine was under a 4 hour callout maintenance so it got
fixed in about 2 days.


I wouldn't recomend anyone casually dives into their machine unless they
have a fair idea what they are doing and its out of guarantee. If you can
find a resident technical bod with a screwdriver, anti static wrist strap
and who can takes PC's apart properly she/he should be up to doing that
bit.

There are plenty of other causes of the problem however - dodgy chips,
cracks on motherboards and the like. Those aren't going to get fixed by a
bit of basic home maintenance.

---------------------------
From: Marc.Thoelen_at_luc.ac.be (Marc Thoelen)

We had comparable problems about 2 years ago, also with a 3000 500X
machine. The errors at our site only occurred if some big jobs tried to
access the complete RAM address space.
After numerous tests, e.g. with a 3000 500 motherboard or a 3000 800
board,
they finally decided to leave the latter board in the machine.
Since we had a maint. contract, they did this for free.

---------------------------
From: (name deleted by request of sender)

    It might possibly help if you were to remove as many memory simms
as possible (leaving only enough to boot the system). I had problems
with a machine recently, and the DEC service guys didn't have a clue.
Eventually, after about 3 tries, they found that our 3rd party simm's
(from Dataram) had gone bad. I was naturally disappointed with them,
as this took a month to resolve. With any kind of luck, you might be
able to run this down to something that is easily fixed. Good luck!!

---------------------------
From: Gerhard Kircher <kircher_at_edvz.tuwien.ac.at>

If the L2 cache memory is socketed, try to reseat it i.e.
find someone who is able to nondestructively take out the
memories and put them back in. I recently had a similar
problem which was caused by ill seated RAM.


Thanks to all who responded!
Received on Tue Oct 29 1996 - 16:13:19 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT