SUMMARY: large login / mail servers

From: Dick Joltes <joltes_at_husc.harvard.edu>
Date: Tue, 10 Dec 1996 11:15:21 -0500

First, thanks to

Anthony Taltree,
alan_at_nabeth.cxo.dec.com
iglesias_at_draco.acs.uci.edu
sxkac_at_java.sois.alaska.edu
"Pedro J. Lobo" <pjlobo_at_euitt.upm.es>
Pete Davis <pete_at_eagle.deardorff.com>

2nd, the responses (divided by "----" lines):

I'd suspect that you've hit some sort of limit in the lockd/lockf
implementation. Search the tunables for a number of locks or some such.
 Alternately, you might try migrating people away from using NFS for
mail spools, which is a good idea anyway.

----
>3)  a side question:  has anyone had problems with Digital UNIX 4.0a
>on the 4100 series?  At any rate, are there any known patches which
>are required (or generally are considered a good idea) on this release?
>We know about the ping patch already.
As of 11/20 there were 44 'generally released' v4.0a patches, ftp:
  atlanta.service.digital.com:/pub/patches/osfv40a/README
for the summary.  There are likely at least a half dozen which may
be applicable to your configuration (mileage varies).  
If you conclude there are some you want, then ftp patches.tar.Z.  
If you need some tools to de-consolidate patches.tar.Z, ftp:
  raven.alaska.edu:/pub/sois/UA_DUtools.tar.Z
I think by early next year Digital plans to have the patches and
README's web available. 
----
	I'll point that soft mounts are somewhat dangerous because
	there may be programs that blindly expect writes to work
	and don't cope when they don't.  Hard mounts with the
	interrupt feature are safer.
	You need to track down what is being saturated.  That may
	suggest its own solution.  Since the network doesn't seem
	to be the problem and the login server seems ok then network
	and memory bottlenecks are unlikely.  That leaves I/O and
	CPU usage.  What is the CPU utiliziation on the server?
	What is the I/O rate and bandwidth utilization of the
	array?  Enough I/O and all those fast write caches between
	the client and array will saturate.  At that point writes
	to the array could slow down considerably.
----
Pine is somewhat inefficient when reading large mail files, since it
reads the whole thing into memory.  Are you sure you're not paging
or swapping a lot when the system gets slow?
-----
Which version of pine are you using? With 3.91, there is a noticeable
delay (about 4-5 seconds) when you open or close a folder. In version 3.95
the delay dissappears completely.
Anyway, if it is a NFS problem, you can try to configure pine to use IMAP
instead of accessing the mail folders directly via a NFS-mounted drive.
IMAP is, IMHO, a very much better protocol than POP. There is an IMAP
server included in the pine distribution.
----
> Node B:  mail hub. (2 300MHZ CPU, 3 DE500 ethernet, 500MB RAM).
> This node handles all SMTP, IMAP, and POP processing for the users
> on node A, and NFS-serves a 16GB RAID 5 array to the login server 
> and multiple workstations scattered within a single building (all are on
> 10MB ethernet).  This system also has a Prestoserve 8MB module
> that enhances NFS on the mailspool disk.  This system runs 15 nfsiods
> and 64 nfsds.
> 
> MAXUSERS on both nodes is at 1024.
This seems kind of high.  There are a lot of resources used for MAXUSERS.
You need one of these for each user that will be logged in simultaneous.
On an NFS server, this could be as low as 64 or 128 or 256, since nobody
logs in to the system.  On a user system, it depends on how many people
will be logged in at once.
Does this RAID use AdvFS (if so, AdvFS is parallelized in 4.0a and the
support is excellent, in 3.2x the support is weak and AdvFS can only
be used by cpu0).  AdvFS is a great performance boost for many reasons
and also ensures that if your power dies it wont fsck and eat up your drive.
Here are some things to look at.. iostat 1, vmstat 1, swapon -s, 
nfsstat -rs, -rc.
See below.
look for:
$ iostat 1
      tty     fd0      re0      re1      dk3     cpu
 tin tout bps tps  bps tps  bps tps  bps tps  us ni sy id
   0    8   0   0   70   4  990  60    0   0   4  0 16 80
   0   58   0   0    8   1  587  61    0   0   5  0 23 72
   0   59   0   0  111   8 1654  71    0   0   7  0 33 60
cpu (id) % cpu idle
and bps for a disk, this is how busy a disk is.
vmstat 1 tells you free memory, multiple free by 8192.
Virtual Memory Statistics: (pagesize = 8192)
  procs    memory         pages                          intr        cpu
  r  w  u  act  free wire fault cow zero react pin pout  in  sy  cs  us  sy  id
  5103 22   24K  29K 9065 720M 212M 157M  371 142M    0 248  8K 511   4  16  80
for example, i have 237mb memory free.
swapon -s :
Swap partition /dev/re0b (default swap):
    Allocated space:       183172 pages (1431MB)
    In-use space:             537 pages (  0%)
    Free space:            182635 pages ( 99%)
Total swap allocation:
    Allocated space:       183172 pages (1431MB)
    Reserved space:         16957 pages (  9%)
    In-use space:             537 pages (  0%)
    Available space:       166215 pages ( 90%)
look for in-use space and reserved space.  In-Use space is space actually
being used, reserved is used because in Digital Unix by default a process
will assign swap space before it attempts to use memory to ensure that there
are always enough system resources.
nfsstat -rc (for client)
nfsstat -rs (for server)
see if you have a large amount of timeouts, badxids, retrans, etc, depending
on the combination this could 
too many nfsd's, etc, not enough, not enough bandwidth, bandwith problem, etc.
also look at netstat -ian and see if you have a very very large amt of
collisions, etc.
> There's a dedicated 100MB duplexed link between these two nodes--
> said link handles ONLY the NFS traffic between the two.  Other systems
> talk to these systems via other ethernet channels.
> 
> Other misc. nodes handle Web, administrative, printing, and other functions.
> They're not really a factor in this situation.
> 
> When we peak out at 420-430 users on the login server (75% of
> whom are running Pine and therefore accessing their inbox files via
> NFS), mail performance drops off badly (1-2 minutes to open the
420-430 users is a lot on a system with 500MB memory.  Consider that
a shell itself takes a meg or two or memory, pine takes more memory depending
on how much mail someone is reading, etc.  As soon as a system has to
depend on swap, it's never going to be the same.
> performance on the login server (i.e. faster access to the NFS-mounted
> mail directories?  We've experimented with hard vs. soft mounts (the
> latter is preferred so the entire system doesn't hang when the mail
> server goes away) and timeouts on the mount with some success.
> Other ideas would be useful.
We mount NFS over FDDI and it works great.  Our mounts are normal, nfs-v3
mounts.
> 2)  does anyone have a set of "preferred" or suggested sysconfig
> parameters for either of these scenarios?  Digital tell me that they're
> working on recommended parameters for various types of servers,
> but so far no data.
There is very little tuning that needs to be done (we investigated this
for a long long time), 
here are some things to look at:
proc:
        max-proc-per-user=1000
        max-threads-per-user=1000
        maxusers = 256
This says that each user can have 1000 processes maximum.  This may be
able to be lower on your system.
vm:
        ubc-minpercent = 2
        ubc-maxpercent = 10
        ubc-borrowpercent = 2
ubc is how much of the resources can be used for buffers.  Your system
can eat up all the memory just for buffers, what I have above says
use a minimum of 2% of the memory for buffers, use a maximum of 10%
and go up in percentages of 2.  by default this is much much much higher
and the system can eat a ton of your resources just for buffers.
> 3)  a side question:  has anyone had problems with Digital UNIX 4.0a
> on the 4100 series?  At any rate, are there any known patches which
> are required (or generally are considered a good idea) on this release?
> We know about the ping patch already.
Digital Unix 4.0A is excellent.  There are a small amount of minor patches
in addition to the ping patch that you will want to install, but they're
minimal.  Our system has been up rock solid since day 1 without a problem.
----
Update:
The problem continues despite some very good suggestions.  At
this point I've optimized the servers themselves about as much as is
currently possible; we're down to what I believe is a problem in the
SCSI subsystem--we're seeing timeouts on the built-in kzpaa on a
fairly frequent basis (anyone know if there's a known bug in 3.2G with
this controller???) and I'm trying to get a replacement from DEC.  It
appears that the controller is either saturated (though iostat doesn't
seem to make this appear to be the condition) or it's flaky.  I want to
swap in a replacement to see if that helps; if not, I have a kzpsa on
order (note:  this controller handles the mailspool RAID array).
The optimizations done so far include:  
1)  updating the firmware on our Viper RAID array
2)  decreasing ubc-maxpercent and ubc-minpercent to good values
(they were way too high); they're now 1 and 2 on the mail server, and
1 and 5 on the login server.
3)  increasing timeo= from 300 to 600 on the soft mounts
4)  increasing bufcache to 5 on the mail server
Unfortunately, at least for the time being we're stuck with using NFS-
mounted mail spools in this configuration.  As soon as we can come
up with a better plan, we'll go with it.  I think that once I replace the
kzpaa controller (or find a bugfix for these timeouts & resets) we'll be
in better shape.
Thanks,
Dick Joltes
Manager, UNIX Systems & Multiplatform Services
Harvard Arts & Sciences Computer Services
joltes_at_fas.harvard.edu	http://www.fas.harvard.edu/~joltes
voice:  617-495-9281	fax:  617-495-1210
Received on Tue Dec 10 1996 - 17:51:55 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT