NFS client calls through the roof

From: Tom Combs <combs_at_osprey.magnet.fsu.edu>
Date: Wed, 10 Jan 2001 11:19:02 -0500 (EST)

Hello,

  First let me give you some background, the Tru64 relevance comes
  near the end...

  I have a Sun Ultra 10 running Solaris 7.0 that is a NFS server to 47
  computers, about half being Alphas running Tru64 4.0D through 5.0 and
  the other half being Red Hat Linux 6.2 boxes, with one other Solaris 7
  thrown in for good measure. This collection of computers is used by
  Physicists that do a good deal of computational work with some heavy
  I/O. I try to get them to keep their computational I/O local to the
  machine but this doesn't always work. At any rate, this is a busy
  network that for the vast majority of the time works very well.

  However.... on two occasions, separated by 6 months in time, I have run
  up against a situation where the NFS server (Sun box) is being hammered,
  the kernel is running at 90-99% CPU. Needless to say, during this time,
  the whole network dance comes to a grinding halt. I discovered that if
  I stop NFS on the Sun, the Sun becomes happy. If I started it, even with
  only one file system exported to one machine, the kernel usage once again
  goes through the roof ( all the other clients still expect to be served
  by the Sun ).

  During the first episode (6/00), I assumed that the problem was with the
  Sun and worked with Sun Microsystems which resulted in the installation
  of a patch that relates to Solaris auto negotiating through a Cisco router.
  We had just upgraded our Cisco and the patch appeared to have fixed the
  problem. In hindsight I now realize that I also rebooted the alphas at
  the same time - see below.

  But lo and behold the problem with the Sun kernel running at 95+% reared
  its head once again this week. This time I noticed that the switch that
  eleven of the alphas are connected to was being hammered (new switch since
  the first episode with pretty lights). I started poking around on the
  alphas and saw that the NFS client calls (nfsstat -c) were astronomical,
  in the millions. I'd issue an nfsstat -z to clear the counters and then
  immediately do a nfsstat -c and the client calls would be in the thousands
  if not ten thousands in a 15 second time span! No wonder the poor Sun
  was swamped.

  I tried stopping NFS client services on the alphas and then restarting
  them but this did not fix the problem. I ended up having to reboot each
  alpha. For each one that was rebooted, the kernel usage on the Sun would
  drop about 10%. Now that all the alphas are rebooted, everything is back
  to normal and running smoothly. So this really is looking like a problem
  with Tru64 to me.

  Now the question, has anyone seen this sort of behavior? Is there a patch
  out there that everyone but me knows about? Any advice will be greatly
  appreciated!



--
Tom Combs                                      E-mail: combs_at_magnet.fsu.edu
National High Magnetic Field Laboratory        Phone:  (850) 644-1657
1800 E. Paul Dirac Drive                       Tallahassee, FL 32310
Received on Wed Jan 10 2001 - 16:19:00 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT