Apologies as it was very late and the information was incomplete. 4.0d
without patches and a 8400 5/300.
Aug 31 22:08:07 hermes vmunix: CPU number 1 not booted due to fatal machine
check state.
Aug 31 22:08:07 hermes vmunix: The system must be reset or power cycled
to clear this state.
The above was in the /var/adm/messages. On the console after
another crash was a startup message about processor 1 having
halted.
Come to think of it, letting the system boot with processor 1 disabled
probably would have meant more sleep.
The answers from the list were that it could be either hardware or software
and that confirmation was quite helpful.
The immediate cause appears to have been hardware. I replaced processor 0/1
with a spare and upgraded the firmware to 5.4 and the problem appears to
have gone away.
The release notes for the latest 4.0d show several software causes for the
simple lock timeout panic but none of them appeared to have the same
footprint as this crash except high I/O on a Qlogic interface (if I remember
the release notes correctly).
Clearly the patches are a good idea.
Thank you,
John Nebel
---------- Forwarded message ----------
Date: Wed, 1 Sep 1999 01:03:29 -0600 (MDT)
From: John Nebel <nebel_at_athena.csdco.com>
To: tru64-unix-managers_at_ornl.gov
Subject: panic (cpu 0): simple_lock: time limit exceeded
Does this look like a processor card failure?
Aug 31 23:37:10 hermes vmunix: simple_lock: time limit exceeded
Aug 31 23:37:10 hermes vmunix:
Aug 31 23:37:10 hermes vmunix: pc of caller: 0xfffffc0000419418
Aug 31 23:37:10 hermes vmunix: lock address: 0xfffffc00005fdb80
Aug 31 23:37:10 hermes vmunix: lock info addr: 0xfffffc0000632fb0
Aug 31 23:37:10 hermes vmunix: lock class name: kernel_pmap.lock
Aug 31 23:37:10 hermes vmunix: current lock state:
0x800000fb004193a5 (cpu=0,pc=0xfffffc00004193a4,busy)
Aug 31 23:37:10 hermes vmunix:
Aug 31 23:37:10 hermes vmunix: panic (cpu 0): simple_lock: time limit exceeded
Aug 31 23:37:10 hermes vmunix: syncing disks...
Aug 31 23:37:10 hermes vmunix: trap: invalid memory read access from kernel mode
Aug 31 23:37:10 hermes vmunix:
Aug 31 23:37:11 hermes vmunix: faulting virtual address: 0x0000000000000
078
Aug 31 23:37:11 hermes vmunix: pc of faulting instruction: 0xfffffc00002fa
660
Aug 31 23:37:11 hermes vmunix: ra contents at time of fault: 0xfffffc00002f7
3e0
Aug 31 23:37:11 hermes vmunix: sp contents at time of fault: 0xffffffffb6b8f
John Nebel
Received on Wed Sep 01 1999 - 14:10:37 NZST