SUMMARY^3: CPU-Exceptions ...

From: Sable - Hendrik Roepcke <alpha_at_jungle.toppoint.de>
Date: Fri, 30 Aug 1996 16:24:13 +0200 (MET DST)

Hello,

my question was:

My name is Hendrik Roepcke and I have to administrate
an Alpha-Server 2100 5/250 with 1 GB RAM.

The system is running on DU 3.2C and was bought in
the end of Feb. 96.

Since that time the machine tends to complain in
the binary.errlog:

300. CPU EXCEPTION!

This may occur in msec-range and from each of the
4 CPUs. At the end the disk is filled and the system
often hangs due to space problems.

After a RESET and a "boot" from the firmware level
the mashine hangs with a memory fault and boot to
DU fails. Sometimes even the screen setup is messed
up and some characters on the screen are stripped
across the whole screen.

Have you ANY hint how to stop this?

Digital tried:

- switching of the cpu 0 and 2 in the slots
  (cpu 0 is a different hardware-series)
- firmware update of cpus, pci-bus, ...
- exchange of the TOY-card
- memory-check with VET and mem-checkers..
- exchange of the BACKPLANE....

ALL OF THIS TRIES WERE NOT SUCESSFULL!!!

please answer me and I will publish a SUMMARY!


Bye

 Dr. Hendrik S. Roepcke
  
----------------------------------------------------

 ok... the answers were mixed and later on I had
 access to the searchable archive of this list...

Guy Dallaire seemed to have the same problem...
but in his case the soft-memory errors occured everytime
at the same location.

On out machine the locations are different, Dec says:
its the RAM or the TOY-card... Wait for next "explosion"
and mail us the binary.errlog (analysed with uerf -Z).

-----------------------------------------------------------------------
-----------------------------------------------------------------------
here is the answer of Mr. Honore:
i think it is the best one i got!
--------------------------------------------------------------------

Dear Dr. Roepcke,
        It is possible that one of the memory modules may be
faulty. I had a similar problem with an AlphaServer 2100 and was
filling my binary errorlog file with errors at a rate of 3000
entries per minute. Evidently the uerf utility does not parse
the error entries for the AlphaServer 2100 properly. I had to
use the uerf -R -Z to find out what was going wrong with some
help from Digital. They also told me about the uerf utility and
its inability to properly parse the error entries. They told me
that I should obtain and install DECEvent, but the catch is that
it only runs on DU 3.2g and later systems.

Yours sincerely,
Robert Honore
robert_at_digi-data.com
Phone: 623 6658 Fax: 623 0978
Snail Mail: Digi Data systems limited, 96 Wrightson Road,
Trinidad, W. I.

-----------------------------------------------
another from decus:
------------------------------------------------

From: soma_c_at_decus.fr (Claude SOMA - CNTS)

I think , I saw a problem like your's on this list,
saying that CPU EXCEPTION are not cpu problem
but memory problem.
(it seem to be a uerf bug and if you use uerf with -Z digital can then
analyse it)
ask corporate-licensing_at_digital.com
Some body at this address know the answer.
Claude Soma

-------------------------------------------------
and one from Mitch:
-------------------------------------------------

From: "Mitch Bertone" <mbertone_at_gtech.com>

 Hendrik,

  Sounds like a bad memory board, if you have 1 gig I assume there is more
  than one board. First have DEC service re-seat the boards in the
  backplane, if that doesn't help have them replace the boards (I believe
  DEC memory has a life time warrantee)

  We had a 2100 4/275 with 512 meg that did the same thing for a year
  until it started crashing regularly, DEC didn't think to replace the
  memory board. Colorado Springs can dial in and read your system dump
  file or can get enough from uerf to confirm a bad board. They basically
  don't want to replace a $60,000 board.

        Mitch

        mbertone_at_gtech.com

----------------------------------------------------------------
another answer. But we didnt try this...
however: do we need patches? which one?
----------------------------------------------------------------
From: KPOOTS_at_mickey.gects.ge.com (Kent Poots)

I hope you don't mind a few suggestions.

- remove 1 (or more) of your cpu's and see if the problems persist
- get DEC to check if you cpu's are at the same rev level, and if
  not, then get them upgraded (this should be a maintenance contract
  or warranty issue, I would think)
- check to see that you have all the relevant patches
- check that your CPU and memory cards are plugged into the right
  slots in the backplane -- they have to go into a particular slot

I'm running the one-processor version of a 2100A, and its been very solid.

Hope this helps

KP

---------------------------------------------------------------------------
sofar for now!


Hendrik
Received on Sat Aug 31 1996 - 00:18:17 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT