[Q] computer crash - need help with why....

From: claudia burg <claudia.burg_at_asu.edu>
Date: Tue, 06 Jul 1999 11:40:09 -0700 (MST)

hi,

my dec alpha 200 4/166 with 160Mb ram running du4.0d with patch kit 3
applied crashed on july 3rd. i, unfortunately was not around and one of
the users rebooted it without calling or notifying me (i hate that!). so
i returned from vacation today to try to put the pieces together to find
out why and i need help. below is a description of what was happening
when the computer crashed, a few sketchy details of the screen info at
time of crash, and some dbx output. thanks in advance for any help!

details:
        the details (as near as i can get) of what was happening just
before the crash seem to be that the user was logged in from home (the
modem isn't on this computer - the modem is on a campus wide accessible
computer - i am sure the problem is not login related). the user was
running an iraf script (processes pretty astronomical images) that
performed statistical calculations on an image of size 130Mb. all he saw
was the computer stopped responding.

some (probably/hopefully) unrelated info:
        the computer crashed a few weeks ago due to a data flow problem -
reading an exabyte with these 130 Mb images across the network (not my
idea) caused what we are pretty sure was in essence too much water flowing
thru the pipe. the data transfer rate was too high for some part of the
transfer and it crashed the computer with the exabyte on it. (we noticed
this tape reading program - iraf - refuses to use swap space. it reads
data till the memory is consumed and then writes some out to disk...)

logfiles:
        log files don't reveal much at all. i can send them, and other
details requested, to those who wish them.

screen notes:
        he wrote down a few notes before he typed reboot. the lines he
wrote down were (and i quote, word for word here):
        coma_edsr
        ....
        epic_dcsr
        ....
        panic (cpu 0): Machine Check - Hardware error
        syncing disks...
        DUMP:...
        ...
        ...
        succeeded
        halted CPU 0
        ...
>>>


dbx on the core file:

dbx version 3.11.10
Type 'help' for help.
Core file created by program ""


warning: cannot get register (number = 64)
stopped at
warning: cannot get register (number = 64)

warning: cannot get register (number = 64)

warning: PC value 0x0 not valid, trying RA

warning: cannot get register (number = 26)

warning: RA value 0x0 not valid, trying text start
>
warning: cannot get register (number = 64)
 [vm_mem_init:178, 0xfffffc0000230000] ldah r1, 0(gp)


*********************************************************
* Message From: *
* Claudia-Angelica Teresa Chiarenza Burg *
* aka Gigi *
* Claudia *
* G^2 *
* Currently Computing From: Arizona State University *
* Working on: a PhD in Physics and Astronomy *
* Claudia.Burg_at_asu.edu *
* www.public.asu.edu/~caburg *
*********************************************************
Received on Tue Jul 06 1999 - 18:42:17 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT