Alpha hardware error debugging

From: Jonathan K Flaster <jflaster_at_engin.umich.edu>
Date: Fri, 15 Dec 2000 15:56:39 -0500 (EST)

I have a AlphaPC 164lx running with a 667 MHZ processor,
firmware revision 5.8-1 and I'm running Tru64 version 4.0g

It has been consistantely crashing to console every cople days and I have
been trying to track down the problem for a long time.

It passes all of the console level test and I have run regressive test for
a couple days and it seems to pass it.

It was running 384.00 megs ( 2x 64mb, 2x 128mb), I removed the two 64mb
sims in hope that it was a memory problem and the system crashed again.

I assume it is a hardware error due to the fact I have ruun the system
with a copy of linux (redhat 6.2) and Tru64 versions 4.0f and 5.0a and I
have experienced similar behavior.

 Here are my questions:

1. What facilities exist in Tru64 to track down hardware errrors and
reasons for crashing?
2. Ever since I took out the 128mb of ram, all network transfers from the
system have been sluggish. I did a kernel recompilation and it is
detecting the correct amount of ram. Is there a set of commands needed to
be executed for changes in the memory size.
3. The following is my error log, does anyone have any suggestions?

Thank You,

---
Here is the crash (from /var/adm/message)
Machine Check Processor Fatal Abort
Machine Check Code = 98
Processor detected hard error
pal temp[0-1]           = ffffffffffffffff 0000000140013b88
pal temp[2-3]           = fffffc00004a9390 0000000000005200
pal temp[4-5]           = 0000000000000000 0000000000000400
pal temp[6-7]           = 0000000480010000 fffffc00004a8c50
pal temp[8-9]           = 1f1e161514020100 fffffc00004a90d0
pal temp[10-11]         = 000003ffbf818a40 fffffc00004a8f40
pal temp[12-13]         = fffffc00004a9300 0000000000000000
pal temp[14-15]         = 00000000c6008051 0000000007f457c2
pal temp[16-17]         = 0000009806700001 0000000000000000
pal temp[18-19]         = 000000011ffffbe0 ffffffff9103fa38
pal temp[20-21]         = 0000000007c16000 fffffc00004a9330
pal temp[22-23]         = fffffc0000688390 0000000007821a38
shadow[0-1]             = 0000000000000000 ffffff000f29d0af
shadow[2-3]             = 0000000000000000 0000000000000008
shadow[4-5]             = ffffff75ddd9a31f ffffff00006b29cf
shadow[6-7]             = 0000000000006068 0000000007821a38
Address of excepting instruction = 000003ffbf818a40
Summary of arithmetic traps     = 0000000000000000
Exception mask                  = 0000000000000000
Base address for PALcode        = 0000000000018000
Interrupt Status Reg            = 0000000000000000
CURRENT SETUP OF EV5 IBOX       = 0000004160020000
I-CACHE Reg Tag parity error    = 0000000000000000
D-CACHE error Reg               = 0000000000000000
Effective VA            	= 000003ffffe30b60
reason for D-stream  		= 0000000000014950
EV5 Secondary Cache address     = ffffff000f29d0af
EV5 Secondary Cache TAG/Data parity     = 0000000000000000
EV5 BC_TAG_ADDR         = ffffff800bcd0fff
EV5 EI_STAT_ADDR Phys addr of Xfer = ffffff75ddd9a31f
Fill Syndrome           = 0000000000002bf5
EI_STAT reg             = fffffff005ffffff
LD_LOCK                 = ffffff00006b29cf
PYXIS_DMA_DATA          = 0000000000000000
CIA/PYXIS ERR                   = ffffffff80000080
PCI BUS Master state machine generated Master Abort
CIA/PYXIS ERR STAT              = 0000000000000010
CIA/PYXIS ERR MASK              = 0000000000000b9b
CIA/PYXIS ECC_SYN               = 0000000000000c0c
CIA/PYXIS MEM ERR0              = 000000000285cd50
CIA/PYXIS MEM ERR1              = 0000000058000000
ISA bridge NMI status & control = 0000000000000030
CIA/PYXIS PCI ERR2              = ffffffff83840018
panic (cpu 0): Processor Machine Check
syncing disks... device string for dump = SCSI 0 6 0 0 0 0 0.
DUMP.prom: dev SCSI 0 6 0 0 0 0 0, block 7414364
device string for dump = SCSI 0 6 0 0 0 0 0.
DUMP.prom: dev SCSI 0 6 0 0 0 0 0, block 7414364
-Jonathan Flaster 
---------------------------------------------------------
"See I used to be you and lately I've been missing me, 
so I asked if I can room with me again and he said sure!"
Received on Fri Dec 15 2000 - 20:57:53 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT