Need Help on Crashes with eb21164 alpha system

From: Ronald D. Bowman <rdbowma_at_tsi.clemson.edu>
Date: Tue, 28 Oct 1997 15:23:40 -0500

Hi Alpha managers -
Sorry this message is so long, but I am hoping that someone will
be able to help me in solving our crash problem.


Our system which is a dec alpha experimental board 21164 had been
crashing sporadically the first 2 months we had it. After a
few changes to the scsi bus and the video card it appeared our problems
were over. We ran successfully 19 days without a crash and then
in the last 48 hours have crashed 5 times.

More detailed examination of the crash data files resulted in some
information that may help us in solving our problems. We are looking
for some help from anyone who may provide some insight into our
problems.

SYSTEM:
        system is using the dec/osf1 operating system release 4.0 version
        564(4.0B). We have a 333 MHz processor on an experimental board
        21164. Memory is 256 Meg. The jumbo patch #4 from april/may
        has been installed.

_system_string: 0xffffffffff800798 = "Alpha 21164 Evaluation Board 333 MHz"


INFORMATION FROM Crash-Data file: This information is common to the 14 crash
files that we have saved- not just the 5 most recent crashes.

Finally realized that all of our crashes appear to be caused by the same phenomenon.
The panic string is as shown below(same in all the crash files):

_panic_string: 0xfffffc00004dd890 = "Processor Machine Check"

Finally reading through the information on dbx lead to more insight
into how similar our crashes were. looking at the trace information
of all the crash files

All of the crash files had the following information in them:

Machine Check Processor Fatal Abort
Machine Check Code = 98
Machine Check Code = 98
Processor detected hard error

We also saw that the dumps looked the same as shown below:
What may be of interest is that the line of code that caused
the panic is always the same. in our case there appear to be
two cases(I do not know much here), but the first appears to be
from entries 2 and 3 always lines 1925 and 3820 of sched_prim.c.

The other case (the one I feel is more likely the cause of the problems)
is entry 6. That is line 1859 of the file eb164.c when executed under
certain conditions results in the panic event and crashing of the system.
Unfortunately, there is no way(that I know of) to find out what is actually
being attempted at this line of code. If we knew that, then maybe we could
determine what is causing our crashes.

_dump_begin:( this same information appears twice more in our crash files)

0 boot(0x400000000, 0xfffffc00004bdd90, 0xfffffc00004bdd90,
        0xfffffc000e6b42e0, 0xfffffc000027a1b4)
["../../../../src/kernel/arch/alpha/machdep.c":2484, 0xfffffc00003c87dc]

1 panic(s = 0xfffffc00004bffa0 = "thread_block: interrupt level call")
    ["../../../../src/kernel/bsd/subr_prf.c":707, 0xfffffc000027b79c]
    pcpu = 0xfffffc00005218c0
    i = 4980640
    mycpu = 0
    spl = 5

2 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1925,
    0xfffffc00002a9e90]
    thread = 0xfffffc0001954dc0
    new_thread = 0xfffffc00004fce58
    mycpu = 0
    myprocessor = 0xfffffc000011c100
    s = 5
    pset = 0xfffffc00004f3730

3 thread_preempt(thread = 0x26, processor = 0xfffffc000011c100)
    ["../../../../src/kernel/kern/sched_prim.c":3820, 0xfffffc00002aca24]
    s = 2
    pset = 0x1

4 boot(0x0, 0xfffffc0001954dc0, 0x2c0000002c, 0x37, 0x1)
    ["../../../../src/kernel/arch/alpha/machdep.c":2431,
    0xfffffc00003c86b8]

5 panic(s = 0xfffffc00004dd890 = "Processor Machine Check")
    ["../../../../src/kernel/bsd/subr_prf.c":791,
    0xfffffc000027b93c]
    pcpu = 0xfffffc00005218c0
    i = 5204704
    mycpu = 0
    spl = 7
    
6 machcheck(0x2, 0x0, 0x670, 0x20000001a,
            0xffffffff9040b678)
    ["../../../../src/kernel/arch/alpha/hal/eb164.c":1859,
    0xfffffc00003f60dc]

________________________________________________________________

Any help with explaining what we are experiencing would be greatly
appreciated.

Sincerely,
Ron Bowman
Techno-Sciences Inc.
rdbowma_at_tsi.clemson.edu

___________
Received on Tue Oct 28 1997 - 21:55:02 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT