Crashes after adding DE500-AA?

From: Judith Reed <jreed_at_wukon.appliedtheory.com>
Date: Thu, 09 Oct 1997 08:52:31 -0400

Greetings, alpha-managers. We are seeing strange crashes of Alphaserver 400s,
and while waiting for tech support to arrive I'd like to check for any other
similar experiences.

We have two Alphaserver 400s - nodeA and nodeB. Facts are as follows:

        NodeA NodeB
        DU 4.0b DU 4.0a
        (rev 564) (rev 564)
Firmware rev while crashing:
        6.3 6.2
Firmware rev after replacing system board:
        6.4 N/A
Network cards:
        DECchip 21140-AA rev. 1.2 DEC LeMAC Ethernet Interface
        DECchip 21140-AA rev. 1.2 DEC TULIP Ethernet Interface
        DECchip 21140-AA rev. 2.0 DECchip 21140-AA rev 2.0
Memory:
        256 MB 256 MB

NodeA was upgraded to include an Trident video card, 128MB of dataram
memory, and a DE500-AA network card, about 3 weeks prior to the time it
started crashing. When it began crashing, it crashed several times in
a short period. Dec came in, looked at core files, was suspicious of memory.
They pulled all memory, replaced with Dec memory, and sent memory for analysis.
All memory tested clean. Meanwhile, the system crashed again twice, with
no activity on the machine, on about a 5 day cycle. Dec came in, replaced
the system board with one at a slightly newer rev, replaced Trident card
with ATI Mach64 video card, and put back in original memory. Since then,
NodeA has been up for 9 days, seems stable.

NodeB was upgraded to include an ATI Mach 64 video card, 128MB of some
memory, and a DE500-AA network card, 5 days prior to the time it began
crashing.

Memory seems to be ruled out, as it was all pulled, tested, and nodeA still
crashed. Video cards were different during crashes, so only commonality
would be the fact that there *was* a vga card in both nodes. Network cards
have not been ruled out, though nodeA is running fine with the DE500-AA since
the system board swap. I am suspicious of some weird interaction between
the network card and the system board firmware rev. The other weird thing is
the (approximately) 5 day period we see between crashes.

Error msgs appended at end. If anyone has seen anything similar, please
get back to me with info. Thanks much!


-- 
Judith Reed
jreed_at_appliedtheory.com
(315) 453-2912 x335
===========================================================================
Error msgs:
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  302.     PANIC 
SEQUENCE NUMBER                  2.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Wed Oct  8 23:05:28 1997
OCCURRED ON SYSTEM                      nodeB
SYSTEM ID                 x0006000D     CPU TYPE:  DEC 7000 
SYSTYPE                   x00000000
MESSAGE                                 panic (cpu 0): Machine check - 
                                         _Hardware error 
********************************* ENTRY     3. ********************************
*
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  100.     CPU EXCEPTION 
SEQUENCE NUMBER                  1.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Wed Oct  8 23:05:28 1997
OCCURRED ON SYSTEM                      nodeB 
SYSTEM ID                 x0006000D     CPU TYPE:  DEC 7000 
SYSTYPE                   x00000000
----- UNIT INFORMATION -----
UNIT CLASS                              CPU 
Received on Thu Oct 09 1997 - 15:28:42 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT