Atrocious performance under Digital UNIX 3.2

From: Brad Daniels <daniels_at_jabberwock.biles.com>
Date: Wed, 02 Aug 95 11:00:15 -0500

Since I've upgraded to 3.2, I've regularly gotten the machine into a state
where it becomes totally unusable for long periods. The disk goes
completely nuts, and all operations on the system slow to a crawl, to put it
mildly. When it gets into this state, I can get it out by killing off one
or more processes.

Unfortunately, if I'm not in the window where the process is, it can take
anywhere from 10 to 20 *minutes* for the window to come to the top of the
stack. I can watch my DECterms refresh at a rate of about one line every
30-40 seconds. I don't care if I *am* using a lot of memory. A multi-user
system should not perform this badly. Usually, the problem seems to be the
result of having a couple of medium-sized (15-20MB) processes hanging
around, and then starting either a C++ compile, a C++ link, or an execution
of ar on a 40 MB (debuggable) library. It also seems to get worse if one of
the other processes is heavily multi-threaded, but that could just be my
imagination. Even just having a debugger session sitting around doing
nothing during a compile can cause it.

My system has 160MB of RAM and 468MB of swap in overcommit mode. I have
tried tuning ubc_max_percent down as low as 80, and vm_page_free_target up
to 400, to no avail.

OSF 2.1 never gave me this problem, even when I was debugging 80MB images
and running very close to the total available memory on the system. Somehow
the memory system seems to be thrashing horribly even though only a small
proportion of the allocated memory is actually in use at any given time.
C++ is probably running all over its memory space, but the other processes
aren't.

I've tried to collect statistics during one of these hangs, but often, the
programs to generate the statistics can't run before the condition goes
away. I did get a ps xlww, iostat, and vmstat one time. The ps showed a
bunch of average-sized processes plus one huge c++ compile. I can give the
full ps dump to anyone who wants to see it. Here's the other output:

vmstat:
Virtual Memory Statistics: (pagesize = 8192)
  procs memory pages intr cpu
  r w u act free wire fault cow zero react pin pout in sy cs us sy id
  2341 20 15K 104 2558 447K 24K 246K 7 33K 134 303 384 375 20 11 69

iostat:
      tty rz1 rz2 rz3 rz4 cpu
 tin tout bps tps bps tps bps tps bps tps us ni sy id
   1 31 382 9 17 2 481 19 0 0 20 0 12 69

Here's a typical vmstat when it's *not* thrashing:

%vmstat
Virtual Memory Statistics: (pagesize = 8192)
  procs memory pages intr cpu
  r w u act free wire fault cow zero react pin pout in sy cs us sy id
  2353 14 8049 7086 3380 2M 268K 1M 3350 405K 1495 30 226 398 15 4 81

% vmstat -s
Virtual Memory Statistics: (pagesize = 8192)
      3214 active pages
      4838 inactive pages
      7083 free pages
      3380 wired pages
   2622377 virtual memory page faults
    268319 copy-on-write page faults
   1139077 zero fill page faults
      3350 reattaches from reclaim list
    405999 pages paged in
      1495 pages paged out
 170906205 task and thread context switches
  12932721 device interrupts
  97095026 system calls

I really need to know what I can do to fix this problem. Any suggestions
are much appreciated. Thanks!

- - Brad
- ---------------------------------------------------------------
+ Brad Daniels | I may not know what I like, +
+ Biles and Associates | but I know Art. +
+ These are my views, not B&A's | (Nice guy. Cute kids.) +
- ---------------------------------------------------------------
Received on Wed Aug 02 1995 - 18:23:01 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT