NFS3 timeout problems between OSF 3.0 and SGI

From: Jonathan Rozes <jrozes_at_gumbo.tcs.tufts.edu>
Date: Fri, 13 Oct 1995 16:37:28 -0400

Hi everybody--

We have a peculiar situation with an Alpha mounting an SGI via NFS3. It seems
that under any kind of load beyond Extremely Light, the Alpha piles up with
'NFS3 server not responding' messages alternating with 'NFS3 server ok'
messages. The odd thing is that the server appears completely normal the
entire time (normal meaning under no strain and no error messages).

My test case is this: The SGI is a Challenge L with 4 200MHz R4400 CPUs and
512M memory running Irix 5.3 XFS. The local filesystem I'm writing to is an
EFS filesystem residing on a Challenge RAID Level 5 disk array. The SGI is
doing absolutely nothing -- I'm the only user logged in and I'm only watching
the NFS traffic. Needless to say, this should be more than enough horsepower
for the job. The SGI runs 32 nfs daemons.

The client is an Alpha 3000 M800 with 512M memory running OSF/1 3.0. It is
mounting the SGI via NFS3 read/write with no special options. The Alpha has
roughly 100 users logged in, mostly reading email and news. This side runs
32 nfs I/O daemons.

The two machines are in the same physical location (within earshot of each
other) and are both connected via FDDI to a DEC GIGAswitch, so there are no
problems relating to network latency.

I start five processes simultaneously on the Alpha, each creating a ~110M tar
file on the NFS-mounted directory (from files on a local disk). After roughly
30 seconds, the 'NFS server not responding' and 'NFS server ok' messages start
hitting the screen. The SGI appears completely normal, with no apparent
sluggishness and a low load average. No messages in the system logs either.

An 'nfsstat -c' on the alpha yields this:

Client rpc:
calls badcalls retrans badxid timeout wait newcred
325825 4203 0 362 4176 0 0
badverfs timers
0 22662

Client nfs:
calls badcalls nclget nclsleep
321649 27 321649 0

Client nfs V3:
null getattr setattr lookup access readlink read
0 0% 11484 3% 17043 5% 12188 3% 1284 0% 0 0% 85434 26%
write create mkdir symlink mknod remove rmdir
160214 49% 5950 1% 328 0% 72 0% 0 0% 4528 1% 249 0%
rename link readdir readdir+ fsstat fsinfo pathconf
0 0% 0 0% 3093 0% 1807 0% 37 0% 5 0% 0 0%
commit
7621 2%

An 'nfsstat -s' on the SGI yields:

Server RPC:
calls badcalls nullrecv badlen xdrcall duphits dupage
237638 0 3605 0 0 3571 159.78

Server NFS:
calls badcalls
237638 0
null getattr setattr root lookup readlink
0 0% 6802 2% 6230 2% 7199 3% 865 0% 0 0%
read wrcache write create remove rename
66082 27% 133084 56% 9446 3% 176 0% 41 0% 0 0%
link symlink mkdir rmdir readdir fsstat
3034 1% 171 0% 0 0% 0 0% 2087 0% 1220 0%


Note that the SGI reports zero RPC and NFS badcalls while the Alpha reports
4203 and 27, respectively. Also note the high number of timeouts on the Alpha.

I tried several mount options, none with any real success:

sync: RPC bad calls reduced significantly (1 or 2 every few seconds, all
        from timeouts again). Performance is unacceptable though -- ~13 RPC
        and NFS calls per second and after five minutes, each process has
        written only ~2.5M of their tar files.

nfsv2: Same behavior as with nfsv3.

timeo: Doubling the timeout value to 2.2 seconds (from the default 1.1)
        seemed to have no effect whatsoever.

As an extreme measure, I tripled timeo to 3.3 seconds and doubled retrans
to 8, but still saw the messages (not quite as often, but they were there
nonetheless).

I then noticed that SGI recommends running no more than 24 NFS daemons. So I
tried again with the SGI running nfsd 24.

I still got the messages, but this time I could notice a pattern: watching
with nfsstat, I could see the RPC calls go from several hundred per second
with no timeouts, to less than 10 per second with all calls timing out. The
intervals typically ran 20-60 seconds okay and 5-10 seconds not okay.

So, the bottom line of all this appears to be that we are seeing NFS timeouts
when there is no real logical reason to be seeing them. After watching this
stuff all afternoon, I can make two somewhat educated statements:

1. It is possible that the server is ignoring incoming RPC calls when it
   tries to perform a (large) group of delayed writes.

2. This is a fairly high number of writes for the server to be doing -- a
   total of over 500M, which just might be more than it can handle.

I am reluctant to increase timeo and retrans because I want to have a better
understanding of what is happening under the hood before resorting to those
methods. It seems possible that OSF/1 and SGI just aren't working together
correctly.

Any and all input is greatly appreciated, especially from NFS gurus ;-)

Thanks,
jonathan

-- 
+++ Jonathan Rozes, jrozes_at_tcs.tufts.edu, jrozes_at_tufts.edu
+++ http://gumbo.tcs.tufts.edu/
+++ Mind Over Liver: The liver probably contains 100 million cells, but
    1,000 livers do not add up to a rich inner life. <GDF:SA 09/92>
Received on Fri Oct 13 1995 - 21:56:41 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:46 NZDT