Thanks to:
alan_at_nabeth.cxo.dec.com (Alan Rollow)
"Doyle, Danny Mr ITT/FSC" <DoyleD_at_fsc1.vafb.af.mil>
"Robert L. McMillin" <rlm_at_syseca-us.com>
for quick replies.
Alan was able to explain the boot/probe process better, which
has given me a bit more understanding on how to debug this
problem. He also recommended installing DECevent to get
more information out of the error logs.
Other guesses included flaky hardware, or firmware which
replacement fixed things. We don't have a maintainence
contract on these 3400s, so board replacement is a pretty
expensive option. The firmware is almost certainly fine.
Based on what alan's said, i think there are multiple things
wrong, which is why debugging this is so hard. One problem
i noticed, was installing a Turbochannel Prestoserve card,
and rebuilding the kernel to support it. The moment i rebooted
with the new kernel, and it reached the probe stage and saw
the card, the machine would fall over. When i removed the
card, and booted from genvmunix, rebuilt the kernel without
support and rebooted, the machine comes up fine.
All the initial tests show no errors, and the machine previously
had a PMAGB installed in TC0, running X, with no problems.
As it stands now, i've got DU4.0B installed, and running without
problems. The machine is going to be used as a fileserver serving
NFS, so it's a bit sad i haven't got the presto board working in
it yet, but it should cope.
The config exactly is:
Alpha 3000/400
16 * 4M simms (64M)
RZ26 + RRD43 internal
2 * TLZ07-VA + 4 * RZ29B-VA in external storageworks as JBOD.
cheers,
-jason
--
jason andrade dstc pty ltd jason_at_dstc.edu.au
senior sysadmin level 7, gehrmann labs i just wanna be
phn +61-7-33654307 university of queensland bluemisty
fax +61-7-33654311 queensland 4072 australia and barefooted
<original reply from alan>
> From: alan_at_nabeth.cxo.dec.com (Alan Rollow - Dr. File System's Home for Wayward Inodes.)
> To: jason_at_dstc.edu.au
> Date: Sun, 12 Apr 1998 01:18:13 -0600
> Subject: Re: machine check - panic *only* on install from cdrom ?
The PALcode isn't as much probed as it recognized for what
version it is. And this happens long before it gets to the
SCSI adapters. The built-in SCSI adapters are usually low
numbered slots and checked first. I think the add-on TURBO-
channel slots are next, in order. Then it gets the remaining
built-in slots; serial interface, ethernet, graphics and
audio (bba).
If an error log entry is logged for the panic, uerf(8) might
translate it, but it could be too early in the boot to picked
up. DECevent (dia) would have a better chance of undering it
if there is one.
Received on Mon Apr 13 1998 - 00:06:28 NZST