SUMMARY : a lot of swapping -> no network

From: Joel Marchand <Joel.Marchand_at_ariana.polytechnique.fr>
Date: Mon, 03 Aug 1998 19:44:22 +0200 (MET DST)

        Hello,

This is the summary of the responses about my "a lot of swapping -> no
network" problem.

At this time, I have followed the /etc/sysconfigtab modifications suggested by
Stephen Mullin.

After my vacations, I will .. buy a small PC under FreeBSD dedicated to the
NFS services.

Many thanks to all the people who have helped me !

        Joel Marchand

P.S : Internet is a really wonderful tool when you are sysadmin :-)))

                        ------------
                        THE QUESTION
                        ------------

  Hello,

I install recently a DEC Ultimate - 2x533 Mhz processors - 2 Gbytes of RAM -
4.3 + 18 Gbytes of Ultra Wide SCSI disk under AdvFS, under Digital Unix 4.0d.

This machine is a CPU-intensive server and a NFS server for a cluster of Unix
boxes. It is connected to a 10 Mbs switch, like other Unix boxes.
Globally the NFS usage on the cluster is small and not intensive or frequent.

Since three days, I observe that when the machine makes a lot of swapping
(typically 2 or 3 jobs greater than 2.5 or 3 Gbytes of virtual memory), the
NFS
services are **completely out**, nobody can telnet or rlogin any more.
ping is the only network service available.

The "black hole" keeps a few minutes (5, 10 or more) and the situation is
back. Later (a hour ?) a new black hole ..

Thanks for any idea !

Best regards,

        Joel Marchand

remark : I will move in the future the NFS service on a dedicated Unix box,
but I would understand this problem. From my experience with others Unix (SunOS,
Solaris, Linux, AIX, Concentrix, Ultrix) I have never observed a so hard NFS
hangup, even with very busy servers.
-----
Laboratoire GAGE - Unite Mixte de Service MEDICIS
Ecole Polytechnique - 91128 PALAISEAU CEDEX - FRANCE
E-mail : Joel.Marchand_at_polytechnique.fr
Tel : +33 1 69 33 34 95 Fax : +33 1 69 33 30 50
WEB : http://medicis.polytechnique.fr/gage/marchand.html

                                -----------
                                THE ANSWERS
                                -----------

>From alan_at_nabeth.cxo.dec.com Tue Jul 28 18:43:01 1998
        Swapping is I/O intensive. NFS is having to compete for
        the same I/O bandwidth that the VM subsystem is using.
        Sounds like NFS is losing. The swapping is occuring
        entirely in kernel mode, which give it an advantage in
        being able to quickly issue I/O requests one after
        another. So quickly, that others wanting I/O may be
        blocked. The worst case would be that the swapping
        code holds some lock into the I/O subsystem the whole
        time that it is doing the swap. If this were the case,
        one could argue that it is a bug since it prevents others
        from getting timely I/O service.

        A possible work-around would be to split the I/O between
        more disks and possible more adapters. Put the NFS served
        data on one bus and the page/swap device on another. That
        should reduce the competition for resources.

        p.s.

        Another work-around would be tune the VM subsystem to prefer
        paging over swapping. The last I knew, the VM subsystem was
        tune to swap out large processes when they weren't running.
        There are cases where this is desirable, but probably not in
        yours. The Guide to Tuning may describe how to change the
        VM parameters so that it will page instead of swap the large
        processes.

        p.s. 2

        I don't know reliably whether the I/O contention is simply
        the closeness of the page/swap code to the I/O subsystem or
        a poor locking design. You'll have to go through our customer
        support organization (MCS) to escalate the problem to the
        attention of engineering to find out.


>From Stephen.Mullin_at_gecits.ge.com Tue Jul 28 18:52:33 1998

        Joel,

        I had like symptoms with my datafile add operations ...
        it turned out that advfs buffering and blocking were the issue.

        I suspect it was more the blocking queue than the size of
        my advfs cache.

        So this may help, give some or all the sysconfigtab entries
        a try if you can.

        Sincerely

        -Stephen
        +++++++++++++++++ SNIP ++++++++++++++++++++++++++++++++++++
        Hello Stephen ,

        as menitoned on the phone here the summary on the advfs parameter.
        We had recently quite a few case where the change of the advfs parameter
        removed the problem:

        Here the recommendation:

        advfs:
            AdvfsCacheMaxPercent = 1
            AdvfsMaxDevQueueLength = 16 or 32
            AdvfsFavorBlockingQueue = 0


        The advfs parameters are necessary for a smooth I/O operation with
        Oracle.

+++++++++++++++++ SNIP ++++++++++++++++++++++++++++++++++++
-Stephen
g GE Capital
Information Technology Solutions WISE Project
________________________________________________________
Stephen Mullin phone:_____ (770)300-3373
WISE Project (SAP) dialcomm:____ 8* 270-3373
GE Capital IT Solutions fax:________ (770)416-9592
2825a Pacific Drive
Norcross, GA 30071
                                          smullin_at_gecits.ge.com
________________________________________________________

>From mjwatson_at_snafu.livenet.net Tue Jul 28 19:47:05 1998

Joel,

While I doubt this is the same problem, we were hanging our network
with NFS mounts, but we were swamping the Ethernet switch with FDDI
packets. It's possible that your occasional NFS activity is doing
the same and your downtime is how long it takes your switch to recover.

However, if your users are not complaining about response time, it's
rather unlikely.

Our solution was to decrease the size of our FDDI packets to match
the Ethernet packet size. If you have a disparity in packet sizes,
that would be a possible solution for you too.

Good luck,
Michael Watson

*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Michael Watson, EMT-P N6WAV Virginia Beach, Virginia
N20300 1977 C177B _at_ KCPK/KECG mjwatson_at_livenet.net
*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Received on Mon Aug 03 1998 - 17:45:43 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT