Thanks to Spider Boardman for input. I haven't really solved
the problem yet, but I want to follow up with more information.
After reviewing the source code for qpopper 2.4, the following
things can be done to improve it's performance :
o Comment out gethostname() call in pop_init.c to find local
machine name and hard coded the local machine name. We were
already doing this.
o Compile qpopper with SERVER_MODE enabled to reduce spool I/O
on POP3 transactions which delete all mail, or leave all mail
on the server. Did this, didn't see a noticable difference.
Spider suggested the /etc/passwd file being out of sync
with the ndbm database copy would cause performance
problems. The suggestion for managing accounts was to use
useradd and friends. Well, those are far to slow for practical
use in an envrionment with 22000+ users in /etc/passwd and more
than 100 account adds/changes/deletes per day. We do a
mkpasswd at midnight and use edauth to force additions to
the /etc/passwd text file into the Enhanced Security
databases as account activity happens.
The real problem is the getprpwnam() function. qpopper
calls getpwnam() and getprpwnam() when authenticating
a POP3 user. I wrote two programs, one to call
getprpwnam() and one to call getpwnam(). I expected to
see the getpwnam() function be much slower than the
getprpwnam() function, in fact it is a bit slower under
low load conditions. When we start cranking ~4 POP3
authentication requests per second, the results are a
little different :
bash# uptime
09:47 up 36 days, 4:37, 23 users, load average: 121.15, 118.21, 124.81
bash# time ./getpwnam kevin
0.01 user 0.01 system 0:00.69 elapsed 3%CPU
(0avgtext+4avgdata 1536maxresident)k
0inputs+0outputs (0major+53minor)pagefaults 0swaps
bash# time ./getprpwnam kevin
0.37user 0.13system 1:24.91elapsed 0%CPU
(1avgtext+54avgdata 7424maxresident)k
26inputs+125outputs (9major+166minor)pagefaults 0swaps
One minute, 24 seconds for getprpwnam() to complete is
unacceptable. The auth.db file is a bottleneck in DEC
UNIX Enhanced Security for high volume applications.
That's the only conclusion I can reach.
>From the getprpwnam() man page :
"The getprpwnam() function searches from the beginning
of the data-base until a login name matching name is found,
and returns a pointer to the particular structure in which
it was found."
Please tell me getprpwnam() is NOT doing sequential
database searches and the man page is out of date.
Of course that would be contrary to everything my
testing has shown.
Kevin
Forwarded by Kevin Houle <kevin_at_netins.net>
---------------- Original message follows ----------------
From: Kevin Houle <kevin_at_netins.net>
To: alpha-osf-managers_at_ornl.gov
Date: Mon, 26 Jan 1998 17:01:17 -0600
Subject: C2 Authentication Performance - DU4.0B
--
I've got a scenario happening which I believe is tied to
Enhanced Security authentication performance. Here is the
machine in question :
AlphaServer 1000A/233, 256M memory, DU4.0B, slightly patched.
OSF1 machine.name.here V4.0 564 alpha
Enhanced Security enabled
The machine handles shell account users (maybe 20-40 at
any one time) and lots of SMTP/POP traffic. I'm starting
to see load average spikes into the 80-100 range, where
the machine normally runs at .5 to 2. When the spike is
happening, the disk IO is not excessive, system memory
does not appear to be a problem. But swap reserve fills
up due to the process list growing larger.
In the process list, basically I'm seeing hundreds of
popper processing building up. The popper we are running
has been modified NOT to do DNS queries, because that used
to be a bottleneck, so that's not it. Also during the spike,
it takes minutes to get a login/password prompt going
through the login program or something like ftpd.
The only way we can recover is to comment the pop3 port
out of /etc/inetd.conf for about 2 minutes and the system
load returns to normal. Re-enable the port and off we go
with normal performance.
We've spent several weeks looking at this, and the only
conclusion I can reach is the Enhanced Security subsystem
is being overloaded. The machine has over 21000 logins in
/etc/passwd and the protected database. During the spikes,
we are logging upwards of 3.4 popper accesses per second.
Under normal running conditions during the day, we log
around 2.8 per second. While this is going on, there are
a few logins via telnet/rlogin and ftp, but nothing
significant.
Has anyone else hit a similar ceiling? What are the options,
more memory? OS version upgrade? Patches? Buy another Alpha
and cluster them to distribute the authentication load? There
has to be a good way to scale authentication requests.
Any advice would certainly be appreciated.
--
Kevin Houle
netINS, Inc.
kevin_at_netins.net
Received on Fri Feb 27 1998 - 17:17:38 NZDT