SUMMARY: System lockup "task_create failed" from Martyn Johnson on 1996-09-04 (tru64-unix-managers)

From: Martyn Johnson <Martyn.Johnson_at_cl.cam.ac.uk>
Date: Tue, 03 Sep 1996 13:56:09 +0100

I asked for ideas on the diagnosis of a problem with a system locked up
continuously printing the message:

fork/procdup: task_create failed. Code: 0x6
Unable to obtain requested swap space

when the crash dump analysis showed that there was apparently plenty of swap
space free.

Several people replied with general advice about swap space allocation. The
best clue I got came from alan_at_nabeth.cxo.dec.com, who pointed out that the
crash dump analysis does NOT print out the amount of "reserved" swap space. I
neglected to say in my query that this system is running in "eager" mode, and
hence there can be swap space which is reserved because it is allocated to
potentially writable address space, but which has not actually been written.

The "reserved" value is printed out by "swapon -s" in a running system, but
not given in a crash dump analysis. I discovered by poking in some (slightly
out of date) sources that there is a kernel variable vm_swap_space used to
keep track of the amount of swap space unreserved. In the dump I have of the
incident, this value is down to 5 pages, so this is clearly the immediate
cause of the failure to allocate swap space.

However I still don't know where the space has gone. In the old days, when
"ps" did its work by poking around in /dev/kmem, you could point it at a dump
instead and get all sorts of useful information about the system when it died.
Now that "ps" uses a "clean" interface to the kernel, you can't do this any
more, and I don't know enough about the data structures to be able to track
down the memory. The dump does show a higher than usual number of "httpd"
servers, and the "amd" process has more children than I would expect. It is
possible that these are enough to account for the extra reserved swap space,
though I have my doubts. Even if it does, it may be consequence rather than
cause - if something else in the system wedges it is fairly inevitable that
these processes will tend to breed.

It seems that I have little choice but to try one of:

1. Switch to "lazy" swap mode. I've always been reluctant to use this since I
don't like the idea of an arbitrary process being killed if swap space really
does run out.

2. Add more swap space and see what happens!

Thanks to everybody who responded.

-- 
Martyn Johnson      maj_at_cl.cam.ac.uk
University of Cambridge Computer Lab
Cambridge UK

Received on Tue Sep 03 1996 - 15:40:17 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT