Serious performance problems. from r.evans_at_ic.ac.uk on 1996-10-24 (tru64-unix-managers)

From: <r.evans_at_ic.ac.uk>
Date: Wed, 23 Oct 1996 13:51:00 +0100 (BST)

Folks,

I'm in need of a little help trying to diagnose a problem we're having
with one of our Alpha servers.

The server is a 3000/400 with 64MB of RAM, running Digital UNIX 3.2A.
Together with an almost identical machine (which has 80MB of RAM), it
serves home directories via NFS to 16 3000/400 and 3000/600 clients,
and via Samba (1.9.16p2) to 30 PCs running NT 3.51.

The 80MB machine is the `master' server. As well as being NIS master,
it also exports /usr/local to the Alpha clients, and is the mail
server. Mail access is through a combination of NFS to some machines
and POP to the cluster of PCs and individual PCs around the building.
Despite this load, the 80MB machine works without a hitch.

However, the 64MB box is suffering some very real problems. Whilst
there is a class being taught on the PCs, the amount of CPU time spent
in system mode will gradually climb towards 95 to 97% (Users' home
directories are split across both servers, so only half of the PCs will
be mounting a home directory from this machine). `Syd' will show a
number of smbd processes as the top consumers, although it won't always
be the same ones, different instances of smbd will each take their turn
draining my precious CPU. According to `vmubc', very little time is in
I/O wait. The real problem seems to be memory management.

Once the machine reaches this state, the amount of memory used for the
buffer cache will drop to ubc-minpercent and sometimes even lower (if
vmubc is to be trusted). This in itself doesn't concern me too much,
as the cache still seems to keep a fairly high hit rate. More worrying
is the fact that there is a massive amount of `wired' memory -- at the
moment, the machine has >5,500 pages wired. The output of `vmstat -M'
is shown below.

Memory usage by bucket:

bucket# element_size elements_in_use elements_free bytes_in_use
    4 16 15 1009 240
    5 32 124165 1019 3973280
    6 64 414 98 26496
    7 128 187 69 23936
    8 256 2251 821 576256
    9 512 8762 22 4486144
   10 1024 29 115 29696
   11 2048 23 49 47104
   12 4096 3 5 12288
   13 8192 2 37 16384
   14 16384 24 20 393216
   15 32768 2 0 65536
   16 65536 0 0 0
   17 131072 1 0 131072
   18 262144 0 0 0
   19 524288 0 0 0

Total memory being used from buckets = 9781648 bytes
Total free memory in buckets = 1154672 bytes

Memory usage by type: Type and Number of bytes being used

MBUF = 470272 MCLUSTER = 98304 SOCKET = 46080
PCB = 54400 ROUTETBL = 3744 IFADDR = 1536
MBLK = 1024 MBLKDATA = 128 STRHEAD = 2048
STRQUEUE = 6144 STRMODSW = 2688 STRSYNCQ = 1792
STREAMS = 2816 FILE = 21696 DEVBUF = 24320
UFS MOUNT = 1792 IPM ADDR = 192 IFM ADDR = 320
VNODE = 3410432 PRESTO = 34560 KALLOC = 5693616
TEMP = 192

Note the amount of memory allocated to MBUFs, 470K! The other server,
which has more memory, rarely tops 4K. In addition, there's over 5.5MB
of kernel allocations, whereas the other server only has 1.75MB. One
curious effect is that any `ps' command will show all the processes as
"<defunct>" (not just the smbd processes, but everything, including the
"[kernel idle]" process). At this point, `syd' will freeze.

Occasionally the machine will recover, but in the majority of cases the
final result is a complete hang, or enough complaints that a reboot is
the only option. I realise that 3000/400s are slow machines, but the
other twin has a greater load and seems to take it all in its stride.

`Netstat -f inet' doesn't give any obviously unusual output, and I
can't see anything out of the ordinary in a tcpdump.

I'm at a loss as to how to proceed, any enlightenment that you could
share would be most welcome.

Cheers,
Robert
Received on Thu Oct 24 1996 - 06:03:35 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT