FW: Kernel parameter changes - SAN UBC Failing to relinquish memo ry under high I/O even after I/O quiessed for 15 + min

From: Loucks, Guy <Guy.Loucks_at_det.nsw.edu.au>
Date: Mon, 23 Apr 2001 18:32:20 +1000

Managers,

Some interesting info we have been having in regards to UBC, thoughts are
most welcome, below is most of a note I have sent to compaq CSC.

We now believe the problem to be a more generic UBC issue.

What is happening with High Disk I/O (Advfs) to local and SAN disks the
memory for UBC gets fully consumed. On a default system this is set
ridiculously high (100%) so your machine promptly starts to thrash when you
instantiate a few processes.

The allocation of memory is not an issue, the problem is it is not being
relinquished. We should not be hard paging, and we are extensively.

Previously we were using lazy swap, which resulted in init killing processes
arbitrarily, and migrating to eager swap just prevented processes from
instantiating. The limp around was to throw more swap at the problem. This
however was just masking the problem.

We have now cranked back the UBCMAX%, this seems to have prevented us from
hitting the floor so hard.

So why is UBC behaving so differently under DUX 5? If you run the simple
process below you will see the process chew up UBC. When the system quiets
down, the UBC will be slowly returned. However run 100 or 1000 copies, and
then UBC is never returned. Even if leaving the system idle for days after
the processes are terminated. Umount/Mount the FS, and memory is returned.

Thoughts?

# cat /work/shuntit
#!/usr/bin/ksh

count=${1:-1000}

finish()
{
        exit 0
}
trap finish 1

if [ ! -f M-1 ]
then
        dd if=/dev/zero of=./M-1 bs=10240 count=${count}
fi


while true
do
        dd if=./M-1 of=M-2 bs=10240 count=${count}
        dd if=./M-2 of=M-1 bs=10240 count=${count}
done

System Busy, close to hitting floor, why are we swapping, see ps lax, UBC
not relinquishing memory:

load averages: 5.18, 5.25, 5.02
18:05:13
100 processes: 9 running, 23 sleeping, 60 idle, 8 zombie
CPU states: 0.1% user, 74.4% nice, 20.2% system, 5.1% idle
Memory: Real: 975M/1993M act/tot Virtual: 29M/10211M use/tot Free: 1920K

  PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
11710 cricket 42 2 41M 36M run 0:02 32.20% collector
11492 cricket 42 2 130M 116M run 0:32 25.60% collector
11708 cricket 42 2 45M 39M run 0:03 25.40% collector
11490 cricket 42 2 130M 116M run 0:31 16.70% collector
11175 cricket 42 2 132M 114M run 0:43 11.80% collector
10476 cricket 42 2 132M 114M sleep 1:58 10.90% collector
11486 cricket 42 2 130M 116M sleep 0:21 10.90% collector
10690 cricket 42 2 132M 114M sleep 1:27 10.70% collector
11322 cricket 42 2 130M 113M run 0:34 10.40% collector
11380 cricket 42 2 130M 114M sleep 0:30 9.50% collector
11320 cricket 42 2 132M 113M run 0:32 7.60% collector
 1429 root 44 0 8872K 6062K sleep 1:32 3.40% top
11715 root 44 0 8872K 6062K run 0:00 0.40% top
11716 cricket 46 2 132M 4915K run 0:00 0.00% collector
  667 root 44 0 3776K 245K sleep 0:01 0.00% cron



System at start:


load averages: 0.00, 0.03, 0.09
16:15:22
48 processes: 1 running, 11 sleeping, 36 idle
CPU states: 0.0% user, 0.0% nice, 0.7% system, 99.2% idle
Memory: Real: 13M/1993M act/tot Virtual: 10211M use/tot Free: 1775M

  PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
 1052 root 44 0 8872K 6062K run 0:00 0.40% top
  250 root 44 0 2864K 655K sleep 0:00 0.00% evmd
  959 root 44 0 2112K 327K sleep 0:00 0.00% ksh
  279 root 44 0 2336K 319K sleep 0:00 0.00% evmlogger
  381 root 44 0 2320K 245K sleep 0:00 0.00% syslogd
  588 root 44 0 3552K 204K sleep 0:00 0.00% os_mibs
  584 root 44 0 2592K 172K sleep 0:00 0.00% svrMgt_mib
  570 root 44 0 2088K 155K sleep 0:00 0.00% snmpd
  593 root 42 0 5936K 2588K sleep 0:00 0.00% insightd
  582 root 42 0 3544K 1024K sleep 0:00 0.00% cpq_mibs
  937 root 42 0 3248K 466K sleep 0:00 0.00% httpd
  539 root 32 -12 2272K 360K sleep 0:00 0.00% xntpd


System Quiet (SAN Still Mounted)

load averages: 0.17, 0.12, 0.14
17:24:11
65 processes: 1 running, 23 sleeping, 41 idle
CPU states: 0.4% user, 0.0% nice, 3.7% system, 95.8% idle
Memory: Real: 608M/1993M act/tot Virtual: 10211M use/tot Free: 1088M

  PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
 1429 root 44 0 8872K 6062K sleep 0:56 2.70% top
 7936 root 44 0 8872K 6062K run 0:00 0.40% top
 7867 root 42 0 5048K 999K sleep 0:00 0.40% rasusers
  582 root 44 0 3544K 983K sleep 0:00 0.00% cpq_mibs
 1140 root 44 0 4608K 663K sleep 0:00 0.00% xterm
 7989 root 44 0 2000K 385K sleep 0:00 0.00% grep
  937 root 44 0 3248K 352K sleep 0:00 0.00% httpd
  959 root 44 0 2112K 319K sleep 0:00 0.00% ksh
  667 root 44 0 3776K 253K sleep 0:00 0.00% cron
 7987 root 44 0 1792K 221K sleep 0:00 0.00% rsh
 7990 root 44 0 1792K 221K sleep 0:00 0.00% rsh
 7865 root 44 0 2224K 204K sleep 0:00 0.00% sh
 7624 cricket 44 0 1744K 163K sleep 0:00 0.00% tail
  584 root 44 0 2592K 163K sleep 0:00 0.00% svrMgt_mib
    1 root 44 0 480K 98K sleep 0:00 0.00% init


System Quiet, Busy SAN Disk Unmounted:

load averages: 0.16, 0.08, 0.07
17:35:09
57 processes: 1 running, 14 sleeping, 42 idle
CPU states: 0.0% user, 0.0% nice, 1.4% system, 98.4% idle
Memory: Real: 16M/1993M act/tot Virtual: 10211M use/tot Free: 1807M

  PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
 1429 root 44 0 8872K 6062K sleep 1:06 2.40% top
 8882 root 44 0 8872K 6062K run 0:00 0.30% top
 1171 root 44 0 12M 2179K sleep 0:00 0.00% dxterm
  582 root 44 0 3544K 983K sleep 0:00 0.00% cpq_mibs
 1140 root 44 0 4608K 663K sleep 0:00 0.00% xterm
  937 root 44 0 3248K 352K sleep 0:00 0.00% httpd
  959 root 44 0 2112K 327K sleep 0:00 0.00% ksh
  667 root 44 0 3776K 253K sleep 0:00 0.00% cron
  584 root 44 0 2592K 163K sleep 0:00 0.00% svrMgt_mib
  593 root 42 0 5936K 2457K sleep 0:00 0.00% insightd
  381 root 42 0 2320K 229K sleep 0:00 0.00% syslogd
  580 root 42 0 2608K 180K sleep 0:00 0.00% svrSystem_mib
  570 root 42 0 2088K 147K sleep 0:00 0.00% snmpd
  139 root 42 0 1728K 106K sleep 0:00 0.00% update
  539 root 32 -12 2272K 344K sleep 0:00 0.00% xntpd

How much could UBC be using:?

       UID PID PPID CP PRI NI VSZ RSS WCHAN S TTY
TI
ME COMMAND
         0 0 0 0 38 -6 2.54G 78M * R < ??
4:17.
79 [kernel idle]
         0 1 0 0 44 0 480K 96K pause IL ??
0:00.
52 /sbin/init -a

# bc
254*3
762
1807-1088
719

System Info:

# sizer -v
Compaq Tru64 UNIX V5.0A (Rev. 1094); Mon Apr 23 16:11:03 EST 2001
# psrinfo -v
Status of processor 0 as of: 04/23/01 17:40:17
  Processor has been on-line since 04/23/2001 16:13:07
  The alpha EV6 (21264) processor operates at 500 MHz,
        and has an alpha internal floating point processor.
Status of processor 1 as of: 04/23/01 17:40:17
  Processor has been on-line since 04/23/2001 16:13:07
  The alpha EV6 (21264) processor operates at 500 MHz,
        and has an alpha internal floating point processor.
# uname -a
OSF1 newton.itbsnmp.det.nsw.EDU.AU V5.0 1094 alpha

# scu show edt

CAM Equipment Device Table (EDT) Information:

    Bus/Target/Lun Device Type ANSI Vendor ID Product ID Revision
N/W
    -------------- ----------- ------ --------- ---------------- --------
---
     0    5    0   CD-ROM      SCSI-2 DEC       RRD47   (C) DEC    1206    N
     1    0    0   Direct      SCSI-2 DEC       RZ1DF-CB (C) DEC   0372    W
     1    2    0   Direct      SCSI-2 DEC       RZ1DF-CB (C) DEC   0372    W
     1    3    0   Direct      SCSI-2 DEC       RZ1DF-CB (C) DEC   0372    W
     1    5    0   Direct      SCSI-2 DEC       RZ1DF-CB (C) DEC   0372    W
     2    4    0   Sequential  SCSI-2 DEC       TZ89     (C) DEC   2150    W
     5    0    0   RAID        SCSI-2 DEC       HSG80CCL           V85F    W
     5    0    1   Direct      SCSI-2 DEC       HSG80              V85F    W
     5    1    0   RAID        SCSI-2 DEC       HSG80CCL           V85F    W
     5    1    2   Direct      SCSI-2 DEC       HSG80              V85F    W
     5    1    15  Direct      SCSI-2 DEC       HSG80              V85F    W
     5    126   0   Processor   SCSI-2 COMPAQ    KGPSA-CA           1.22
N
Guy R. Loucks
Senior Unix Systems Administrator
Networks Branch
NSW Department of Education & Training
Information Technology Bureau
Direct +61 2 9942 9887
Fax +61 2 9942 9600
Mobile +61 (0)429 041 186
Email guy.loucks_at_det.nsw.edu.au
Hi Guy,
I have consulted advise from my peers and here are some kernel parameters
that should be changed on your system based from our evaluation from your
sys_check.
- lower the ubc_maxpercent parameter to 50(%)
- lower the vm_ubcseqstartpercent to 35(%)
- increase vm_page_free_target to 256
- increase vm_page_free_swap	to 128
- lower vm_page_free_hardswap to 2048
- increase vm_page_free_min to 32
- increase vm_page_free_reserved to 20
- lower the vm_page_free_optimal to 256
- lower vm_page_prewrite_target to 256
Definitions for these parameters should be available in the sys_attrs_vm man
page. 







Received on Mon Apr 23 2001 - 08:34:29 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT