QUESTION: NFS client performance from Todd V. Minnella on 1996-10-01 (tru64-unix-managers)

From: Todd V. Minnella <minnella_at_husc.harvard.edu>
Date: Tue, 1 Oct 1996 00:48:58 -0400 (EDT)

We have a mission-critical general login machine that is suffering from
poor NFS client performance. What (if anything) can we to do improve
response times?

Details: The general login machine is an AlphaServer 4100 with 2
processors and 1GB of RAM running Digital UNIX v3.2G. It has three 10
megabit Ethernet interfaces - one to our general net, one to a net
dedicated for our primary NFS fileserver (serving home directories) and a
third full-duplex link connected directly to our mailserver.

Our mailserver, another AlphaServer 4100 (2 processors, 512MB RAM, DU
3.2F) is presently NFS-serving our /mailshare/spool directory to all of
our UNIX hosts. It also has three network interfaces; two to general nets,
and one the other half of the full-duplex link to our main login server.
Mail is being served on a 16-GB RAID 5 array. Access to the /mailshare
filesystem from this server (local - not NFS) is quick even during times
of high client load.

Our main problem occurs when 350 or more users log in to our general login
machine. 90% of our users use PINE, so most of our users log in and fire
up a session that makes extensive use of our mailserver. Once we have
about 350-400 users logged in, performance (delays while PINE checks for
new mail) becomes poor. After 450 users log in, performance is horrible;
logins take forever while tcsh checks for new mail, and all mail-related
activities are unusable. Performance on all of our other hosts (with loads
varying from 5 users to 150 users) is quite good. It is ONLY our general
login server that grinds to a halt.

It is the last fact which has me believing the problem to be a client-side
issue. I'm more than willing to be proven wrong, however.

[Even as I type this message, my screen is being disrupted with messages
like the following:
Oct 1 00:29:08 fas vmunix: NFS3 RFS3_LOOKUP failed for server husc-33:
RPC: Timed out
Oct 1 00:29:55 fas vmunix: NFS3 RFS3_CREATE failed for server husc-33:
RPC: Timed out]

What we've tried:
-- upping the NFSD's on the server - it helped when we went from 24->32;
   it's now at 56, and performance is about the same as it was when it
   was set to 32.
-- raising and lowering the number of NFSIOD's - we're currently at 20. I
   tried setting it to 64 to see what would happen, and nothing really
   noticeable occurred. I also tried 7; again, it didn't make a big
   difference
-- adding "timeo=300" to our mount - the mount is a _soft_ mount; this
   option seems to help mask the symptom - the NFS3 timeout messages. I
   don't think it is a solution, however.
-- adding "retrans=3" - this may be helping a _little_. Some of our
   current nfsstat server and client side stats are below my sig.
-- NFSv2 vs. NFSv3 - we tried NFSv2, and performance was about the same
   if not a bit worse

I apologize for the length of this post; my colleagues and I are extremely
frustrated with this problem. The only response I've received (thus far)
from Digital support was to increase our timeo value. :-(

If moving to Digital UNIX v4.0a (and therefore, to NFSv3 over TCP) will
help, we'll consider that. I don't particularly wish pull more
all-nighters, but I'll do whatever is necessary to improve our situation.

Any and all suggestions will be most warmly received!!!

---
Todd V. Minnella
UNIX Systems Analyst, UNIX Systems Group
Harvard University Faculty of Arts and Sciences Computer Services
---
fas:~ # nfsstat -cri10 [on our general login server]
----------------------------------------
Client rpc:
calls      badcalls   retrans    badxid     timeout    wait       newcred
badverfs   timers
5816737    2382       24462      6731       24872      0          0
0          119579     
----------------------------------------
Client rpc:
calls      badcalls   retrans    badxid     timeout    wait       newcred
badverfs   timers
1312       0          2          0          2          0          0
0          13         
----------------------------------------  
husc.harvard.edu:~ % nfsstat -sri10 [on our mailserver]
----------------------------------------
Server rpc:
calls      badcalls   nullrecv   badlen     xdrcall
5395094    0          0          0          0          
----------------------------------------
Server rpc:
calls      badcalls   nullrecv   badlen     xdrcall
595        0          0          0          0

Received on Tue Oct 01 1996 - 07:06:59 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT