SOLUTION
As correctly diagnosed by Dr. Tom Blinn, this was a hardware problem.
The undisputable clue was that the bad machine would always power
itself off when it crashed, whether or not it also left a core file.
According to Digital support, there is no way for software to turn the
power off; it must be hardware problem. We replaced the motherboard,
system disk, and power supply before the system was finally working
reliably again with DU4.0D + PK5. (I still don't understand why the
machine would crash under DU4.0D + PK5 but not under DU4.0D alone.)
PROBLEM
We recently upgraded three AS500/500 from DU4.0B + Patches to DU4.0D
and Firmware Update v5.6 (as500_v7_0a.exe). All three machines appear
to run perfectly on the upgraded software/firmware.
After we applied PatchKit 5, two of the machines successfully rebooted
but the third machine always crashes on the patched kernel. We've
tried a great many variants on the same theme, including a complete
DU4.0D installation from scratch (ie., no update) followed by patches,
but the bad machine always crashes early in the boot process when we
try to boot it with the patched kernel. There is no record of the
crash in /var/adm/messages or /usr/var/adm/crash or via uerf.
The core dump shows a problem in vm_mem_init()
> dbx /vmunix /core
signal Bad system call at
warning: PC value 0x3ff814bca28 not valid, trying RA
warning: RA value 0x120001704 not valid, trying text start
> [vm_mem_init:178, 0xfffffc0000230000] ldah r1, 0(gp)
(dbx) tstack
Thread 0x3:
0 vm_mem_init(0x3ffc008ace8, 0x102, 0x3ffc008acf8, 0x102, 0x3ffc008ad00) [0xfffffc000022fffc]
The most significant difference between the bad machine and the two
good machines is we replaced the motherboard in the bad machine
earlier this year after a power transistor burnt out.
Received on Tue Jan 18 2000 - 07:26:23 NZDT