tb_shoot ack timeout

From: Nick Leonard <nickl_at_poole-tr.swest.nhs.uk>
Date: Tue, 01 Jun 1999 13:46:12 +0100

Since upgrading to a third 5/300 cpu on our primary 2100
the following event occurs monthly or less (sometimes weekly )

This is an extract from uerf

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 2.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Thu May 20 14:03:16 1999
OCCURRED ON SYSTEM poole1
SYSTEM ID x00050009 CPU TYPE: DEC 2100
SYSTYPE x00000000
PROCESSOR COUNT 3.
PROCESSOR WHO LOGGED x00000000
MESSAGE panic (cpu 0): tb_shoot ack timeout

This causes a complete panic reboot resulting in 20-25 mins reboot time plus 1 hour for the disk remirror.

This does tend to occur at high load periods current loads are up to 210 Users

The current config is
2100 3x 5/300
2 x 512Mb RAM
2 x 128 Mb RAM
2x HSZ40 with SW300 cabinets spprox 25GB in each in a LSM mirror

And all running on 3.2c (sorry) (waiting for Informix for a working 7.24 to run with 4.0d so the upgrade will have to wait !#_at_#$$%)

I have managed to tie the crash to running ps aux commands in a number of the cases.I have proved that running ps aux will crash the system on at least two occasions.
 
We (with Digital) have tried changing all three cpu's and the RAM to no avail. It has been diagnosed as a hardware and software problem at various times.

I have tried to patch 3.2c with patches 417 which asks for 458 and 416 but it will not patch as 458 cannot identify the origin of /usr/sys/BINARY/msfs_io.o, this I have been advised is a megasafe file and seems to have replaced :

-rw-r--r-- 1 bin bin 2566 Jul 25 1995 msfs_cfg.o
-rw-r--r-- 1 bin bin 2255 Jul 25 1995 msfs_config.o
-rw-r--r-- 1 1053 tape 102559 Nov 2 1995 msfs_io.o
-rw-r--r-- 1 bin bin 101287 Jul 25 1995 msfs_io.o.org
-rw-r--r-- 1 bin bin 13264 Jul 25 1995 msfs_lookup.o
-rw-r--r-- 1 bin bin 25719 Jul 25 1995 msfs_misc.o
-rw-r--r-- 1 bin bin 23604 Jul 25 1995 msfs_proplist.o
-rw-r--r-- 1 bin bin 3815 Jul 25 1995 msfs_syscalls.o
-rw-r--r-- 1 bin bin 26740 Jul 25 1995 msfs_vfsops.o
-rw-r--r-- 1 bin bin 143189 Jul 25 1995 msfs_vnops.o

Has anybody had a similar problem and found a solution ?

Thanks
Nick L
Received on Tue Jun 01 1999 - 12:51:41 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT