Systems pauses and kernel tuning.

From: Seth Hall <SETH_at_speech.mit.edu>
Date: Thu, 06 May 1999 14:45:44 -0500 (EST)

Hi gang,

I reported some time ago on experiencing debilitating pauses
(momentary hangs, actually), on a few of our Alphas, especially
when running large MATLAB code. I noticed that others on this
list seemed to be reporting similar behavior. While I have made
some progress with tuning, I am still not out of the woods yet.

Since I understand that a significant number of us are experiencing
such undesirable behavior on both v4.0d and v4.0e systems, I thought
I would post an update and see if anyone else had made progress on
these problems.

The following is a breif description of our environment, or at least
those machines having serious trouble. I should mention that the
small memory machines definitely suffer the worst from this problem,
although the 512Mb system is certainluy not immune.

Typical load is generally no more than 20 users per system, with
several MATLAB sessions, CPU utilization >80% about 30% of the time.

At the time of a hang, 'top' will typically show only 200-400Mb of
VM used, out of a possible 2Gb; swapon -s always confirms this.

--------------------------------------------------------------------
Typical hardware/software configurations, including:

Alphastation 250 4/266, 128Mg, 2Mb cache, v4.0d+jumbo 3, and v4.0e
164LX 533MHz, 256Mb, 2Mb cache, v4.0e, 2Gb swap across two spindles
164LX 533MHz, 256Mb, 4Mb cache, v4.0e, 2Gb swap across two spindles
164LX 600MHz, 512Mb, 4Mb cache, v4.0e, 2Gb swap across two spindles


--------------------------------------------------------------------
All have the follwing /etc/sysconfigtab entries:

generic:
        msgbuf_size=16384
        message-buffer-size=16384

vm:
        vm-page-free-target = 512
        vm-page-free-swap = 128
        vm-page-free-min = 128
        vm-page-free-reserved = 64
        vm-page-free-optimal = 256
        vm-ubcseqstartpercent = 60
        vm-ubcseqpercent = 20
        vm-ubcdirtypercent=8
        ubc-maxpercent=70
        ubc-maxdirtywrites=10
        vm-maxvas=2147483648

proc:
        maxusers=256

        max-per-proc-data-size=2147483648
        per-proc-stack-size=8388608
        max-per-proc-stack-size = 33554432
        max-per-proc-address-space=2147483648

vfs:
        bufcache=1
        name-cache-hash-size=512

--------------------------------------------------------------------

Having made the tuning changes above, things *seemed* to be
better, at least for a while. What we are still having a problem
with is pausing when our free pages dips below 100, as reported
by vmstat. In fact, if 'free' ever touches exactly 63, we are hung,
for between 5-30min. Recovery is always clean, without error
indications of any kind, following an intense burst of paging
activity (pent up demand, so to speak?)

The free list hovers between ~150-3000 most of the time, and we
can make it drop below 100 synchronously by invoking certain
functions within MATLAB (e.g., surfm() with 10-20Mb arrays.) Free
drops to a reported 63, and the system hangs. This number is
suspiciously close to my vm-page-free-reserved setting (= 64.)

Nothing odd shows up in the newest version of sys_check, and the
System Tuning and Performance Management Guide is utterly useless
as far as I am concerned.

I am at a loss as to how to control this situation, and to my
surprise, DEC has been less than on top of this one. Any thoughts
would be much appreciated!

Thanks in advance,
                        Seth Hall
                        Systems Manager
                        MIT-RLE
Received on Thu May 06 1999 - 19:39:55 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT