My thanks to Joe Fletcher
Dr. Thomas Blinn
Selden E Ball Jr.
Michael Lamont
Who all provided input, and increased my understanding of how UNIX functions
so that I was able to determine a solution to this problem. There were some
significant learning's for me in this that I would like to share with the
Tru64 community.
********************************************
My problem Description:
The problem is that his directory channel is not starting when a message is
queued to the channel.
On UNIX the PMDF software is suppose to start a job by sending a UDP message
to port 27442, which the PMDF job controller is listening on (you can change
the port in the pmdf/table/job_controller.cnf file).
1. What kernel parameters would control the tcp/ip communication of a UDP
packet?
2. How can we check these parameters.
3. If any of you are managing systems with PMDF running on them, what have
you got your kernel parameters set at to make PMDF run more efficiently.
Also, if you have any ideas about what could be done to improve the
performance of this PMDF system, I would appreciate any input you could
give.
*******************************************
To summarize the responses:
A discussion on UDP and why PMDF uses it for task communication.
There is NO guarantee that a UDP packet will ever reach its intended
recipient.
As far as I know, there are no kernel parameters that will modify what is
probably the expected behavior. Of course, if the reason the packet did not
get delivered is that when it was received, there were not sufficient
resources in the system to buffer it, it would get discarded right away. So
more IP buffering might help.
While it's not PMDF specific, running sys_check on the server, did give some
useful tuning hints. But it did not show anything that could be causing the
Directory Channel to fail.
PMDF is using UDP on purpose - TCP has guaranteed delivery, but with it
comes substantially larger costs in terms of memory, network resources, and
CPU time. The chapter in the PMDF System Manager's guide about the job
controller talks about this to some extent - the operating principle is that
if the system is so overloaded that the UDP packet isn't getting through,
then it probably isn't a great idea to force it to process the messages in
the channel in question at that point in time. (Kind of a built-in load
inhibitor, I suppose you could call it.)
If the channel has a normal flow of traffic to it, then a future UDP packet
will make it through when the system is less loaded and all of the messages
sitting in the channel will be processed then. If the system is under such
a continuous load that a UDP packet never makes it through, then the post
job is supposed to pick it up and run.
While it's not PMDF specific, running sys_check on the server, did give some
useful tuning hints.
***********************************************
The Cause of the problem identified:
In the case of the Directory Channel not starting when mail messages were
received. This was being caused by the "Default" configuration of the PMDF
Job Controller.
The PMDF Job Controller is responsible for scheduling and executing PMDF
tasks upon request by various PMDF components. For example, upon receipt of
an incoming message from any source, the PMDF channel that is handling the
receipt of the message determines the destination, enqueues the message, and
sends a request to the Job Controller to execute the next channel.
PMDF is distributed with a default Job Controller configuration that is
suitable for most sites. This default configuration defines a **single**
queue named DEFAULT with a job limit of 4 and a capacity of 200. The DEFAULT
queue will be used by all PMDF channels which do not specify a queue using
the queue channel keyword. (In the default configuration, the queue DEFAULT
is actually the only queue.)
The queue has a job limit, which governs the maximum number of requests to
be processed in parallel on that queue, and a capacity, which is the maximum
number of requests a queue can store. Requests will be executed as they are
received until the job limit is exceeded, at which point they will be queued
to run when a currently executing request finishes. If the capacity of a
queue is exceeded, requests directed at that queue will be ignored by the
Job Controller.
The system was under a heavy load and the Default queue had exceeded its
capacity limit. Therefore additional requests being sent to it for the
Directory channel were being ignored. In addition, the processing of
Incoming messages was taking up all of the job limit slots and therefore
there were no Directory channel processes were being started.
********************************************
The fix to the problem defined:
Typically, additional queues would be added to the Job Controller
configuration if separate processing of some channels needs to be separate
from that of other channels. For example, we need to prevent messages sent
to a relatively slow channel (tcp_local - the incoming channel) from
blocking processing of messages sent to other channels (directory and
conversion).
In this case we want to use queues with different characteristics. For
example, to control the number of simultaneous requests that incoming mail
channel is allowed to process and allow the Directory and the Conversion
channels to process separately. This is done by creating new queues with the
desired job limit and then use the queue channel keyword to direct those
channels to the new, more appropriate queue.
********************************************
Example configuration of the PMDF Job Controller:
Add the following lines to the job_controller.cnf
[QUEUE=TCP_LOCAL_QUEUE]
job_limit=5
capacity=200
!
[QUEUE=TCP_INTERNAL_QUEUE]
job_limit=5
capacity=200
!
[QUEUE=DIRECTORY_QUEUE]
job_limit=4
capacity=100
!
[QUEUE=CONVERSION_QUEUE]
job_limit=4
capacity=100
In your PMDF.CNF file, make the following changes....
on the directory channel add the keyword
queue DIRECTORY_QUEUE
on the Conversion channel add the keyword
queue CONVERSION_QUEUE
on the tcp_local channel add the keyword
queue TCP_LOCAL_QUEUE
and on the tcp_internal channel add the keyword
queue TCP_INTERNAL_QUEUE
The pmdf configuration will then need to be recompiled -
#pmdf cnbuild
and the PMDF system will need to be restarted -
#pmdf restart
After making these changes the system saw significant improvement in the
system resource usage and the thruput of mail messages.
Tom Welker
Support Specialist
Process Software
800-722-7770
welker_at_process.com
Received on Mon Nov 26 2001 - 15:49:07 NZDT