I have a 3000/500 running as a fileserver and WWW server. System running is
V3.2D-1 with patches up to the 23 July 1996 consolidated patch kit.
Occasionally, the system will jam up in the middle of the night, with the
console continuously printing:
fork/procdup: task_create failed. Code: 0x6
Unable to obtain requested swap space
Externally, the system is serving files over NFS perfectly happily, but no
other network services are working. I cannot log in on the console. I have to
force a core dump and reboot.
The analysis of the core dump shows that there is actually plenty of swap
space free:
_kdbx_swap_start:
Swap device name Size In Use Free
-------------------------------- ---------- ---------- ----------
(null) 358832k 74704k 284128k
44854p 9338p 35516p
-------------------------------- ---------- ---------- ----------
Total swap partitions: 1 358832k 74704k 284128k
44854p 9338p 35516p
_kdbx_swap_end:
This is about normal for this system. As far as I know, nothing unusual is
going on at the point at which it fails. There are no "users" on the system -
it is purely a server. The list of processes in the dump analysis looks
normal.
I have tried reporting this to the support centre, but I can only get a
kneejerk reaction: "increase your swap space". Well, I could do that of
course, but since the amount I already have should be more than enough for the
workload, I have no basis for knowing how much to add! If there IS some
process demanding more swap space than is available, (a) I want to know what
it is, and (b) I want just that process to die, not the whole system!
Personally I'm not convinced that it is a swap space problem at all.
Does anybody recognise the symptoms and know what the true cause might be? Can
anybody think of anything I might examine in the core dump which might give me
some clues?
--
Martyn Johnson maj_at_cl.cam.ac.uk
University of Cambridge Computer Lab
Cambridge UK
Received on Thu Aug 29 1996 - 18:38:53 NZST