Many thanks to the fast responses of the following:
Anthony Talltree <aad_at_nw.verio.net>
Steve Walling <stevew_at_sodak-gaming.com>
Sudarshan Narayana <suda_at_one.net.au>
alan_at_nabeth.cxo.dec.com (Dr. Alan Rollow?)
It looks like the parameters that I should have been tweeking in the
kernel config file is MAXUPRC or maxuprc. The default number for this
should be 64.
However, this only deals with the symptoms, and I suspect there is still
a problem with resource deallocation. Our program creates a lot of sub
processes during its operation, which are supposed to terminate once
they've done their tasks. At this time, the program, running at full
load, will probably cause this symptom once every 7 or 8 hours of
operation. If I bumped MAXUPRC to say, 128, this would extend our run
time to maybe 14 to 16 hours before the symptom returns. So for now, we
will add the new MAXUPRC into the config file and rebuild the kernel,
and instruct the operators to shut down and restart the program at the
end of every shift.
eyc
PS:
I just love patchwork...it's job security! It's fitting that this Alpha
station controls the world's fastest non-defense production SAR image
generation system, which is also full of patchwork.
Received on Thu Aug 20 1998 - 07:18:22 NZST