Managers,
Some interesting info we have been having in regards to UBC, thoughts are
most welcome, below is most of a note I have sent to compaq CSC.
We now believe the problem to be a more generic UBC issue.
What is happening with High Disk I/O (Advfs) to local and SAN disks the
memory for UBC gets fully consumed. On a default system this is set
ridiculously high (100%) so your machine promptly starts to thrash when you
instantiate a few processes.
The allocation of memory is not an issue, the problem is it is not being
relinquished. We should not be hard paging, and we are extensively.
Previously we were using lazy swap, which resulted in init killing processes
arbitrarily, and migrating to eager swap just prevented processes from
instantiating. The limp around was to throw more swap at the problem. This
however was just masking the problem.
We have now cranked back the UBCMAX%, this seems to have prevented us from
hitting the floor so hard.
So why is UBC behaving so differently under DUX 5? If you run the simple
process below you will see the process chew up UBC. When the system quiets
down, the UBC will be slowly returned. However run 100 or 1000 copies, and
then UBC is never returned. Even if leaving the system idle for days after
the processes are terminated. Umount/Mount the FS, and memory is returned.
Thoughts?
# cat /work/shuntit
#!/usr/bin/ksh
count=${1:-1000}
finish()
{
        exit 0
}
trap finish 1
if [ ! -f M-1 ]
then
        dd if=/dev/zero of=./M-1 bs=10240 count=${count}
fi
while true
do
        dd if=./M-1 of=M-2 bs=10240 count=${count}
        dd if=./M-2 of=M-1 bs=10240 count=${count}
done
System Busy, close to hitting floor, why are we swapping, see ps lax, UBC
not relinquishing memory:
load averages:  5.18,  5.25,  5.02
18:05:13
100 processes: 9 running, 23 sleeping, 60 idle, 8 zombie
CPU states:  0.1% user, 74.4% nice, 20.2% system,  5.1% idle
Memory: Real: 975M/1993M act/tot  Virtual: 29M/10211M use/tot  Free: 1920K
  PID USERNAME PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
11710 cricket   42    2   41M   36M run     0:02 32.20% collector
11492 cricket   42    2  130M  116M run     0:32 25.60% collector
11708 cricket   42    2   45M   39M run     0:03 25.40% collector
11490 cricket   42    2  130M  116M run     0:31 16.70% collector
11175 cricket   42    2  132M  114M run     0:43 11.80% collector
10476 cricket   42    2  132M  114M sleep   1:58 10.90% collector
11486 cricket   42    2  130M  116M sleep   0:21 10.90% collector
10690 cricket   42    2  132M  114M sleep   1:27 10.70% collector
11322 cricket   42    2  130M  113M run     0:34 10.40% collector
11380 cricket   42    2  130M  114M sleep   0:30  9.50% collector
11320 cricket   42    2  132M  113M run     0:32  7.60% collector
 1429 root      44    0 8872K 6062K sleep   1:32  3.40% top
11715 root      44    0 8872K 6062K run     0:00  0.40% top
11716 cricket   46    2  132M 4915K run     0:00  0.00% collector
  667 root      44    0 3776K  245K sleep   0:01  0.00% cron
System at start:
load averages:  0.00,  0.03,  0.09
16:15:22
48 processes:  1 running, 11 sleeping, 36 idle
CPU states:  0.0% user,  0.0% nice,  0.7% system, 99.2% idle
Memory: Real: 13M/1993M act/tot  Virtual: 10211M use/tot  Free: 1775M
  PID USERNAME PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 1052 root      44    0 8872K 6062K run     0:00  0.40% top
  250 root      44    0 2864K  655K sleep   0:00  0.00% evmd
  959 root      44    0 2112K  327K sleep   0:00  0.00% ksh
  279 root      44    0 2336K  319K sleep   0:00  0.00% evmlogger
  381 root      44    0 2320K  245K sleep   0:00  0.00% syslogd
  588 root      44    0 3552K  204K sleep   0:00  0.00% os_mibs
  584 root      44    0 2592K  172K sleep   0:00  0.00% svrMgt_mib
  570 root      44    0 2088K  155K sleep   0:00  0.00% snmpd
  593 root      42    0 5936K 2588K sleep   0:00  0.00% insightd
  582 root      42    0 3544K 1024K sleep   0:00  0.00% cpq_mibs
  937 root      42    0 3248K  466K sleep   0:00  0.00% httpd
  539 root      32  -12 2272K  360K sleep   0:00  0.00% xntpd
System Quiet (SAN Still Mounted)
load averages:  0.17,  0.12,  0.14
17:24:11
65 processes:  1 running, 23 sleeping, 41 idle
CPU states:  0.4% user,  0.0% nice,  3.7% system, 95.8% idle
Memory: Real: 608M/1993M act/tot  Virtual: 10211M use/tot  Free: 1088M
  PID USERNAME PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 1429 root      44    0 8872K 6062K sleep   0:56  2.70% top
 7936 root      44    0 8872K 6062K run     0:00  0.40% top
 7867 root      42    0 5048K  999K sleep   0:00  0.40% rasusers
  582 root      44    0 3544K  983K sleep   0:00  0.00% cpq_mibs
 1140 root      44    0 4608K  663K sleep   0:00  0.00% xterm
 7989 root      44    0 2000K  385K sleep   0:00  0.00% grep
  937 root      44    0 3248K  352K sleep   0:00  0.00% httpd
  959 root      44    0 2112K  319K sleep   0:00  0.00% ksh
  667 root      44    0 3776K  253K sleep   0:00  0.00% cron
 7987 root      44    0 1792K  221K sleep   0:00  0.00% rsh
 7990 root      44    0 1792K  221K sleep   0:00  0.00% rsh
 7865 root      44    0 2224K  204K sleep   0:00  0.00% sh
 7624 cricket   44    0 1744K  163K sleep   0:00  0.00% tail
  584 root      44    0 2592K  163K sleep   0:00  0.00% svrMgt_mib
    1 root      44    0  480K   98K sleep   0:00  0.00% init
System Quiet, Busy SAN Disk Unmounted:
load averages:  0.16,  0.08,  0.07
17:35:09
57 processes:  1 running, 14 sleeping, 42 idle
CPU states:  0.0% user,  0.0% nice,  1.4% system, 98.4% idle
Memory: Real: 16M/1993M act/tot  Virtual: 10211M use/tot  Free: 1807M
  PID USERNAME PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 1429 root      44    0 8872K 6062K sleep   1:06  2.40% top
 8882 root      44    0 8872K 6062K run     0:00  0.30% top
 1171 root      44    0   12M 2179K sleep   0:00  0.00% dxterm
  582 root      44    0 3544K  983K sleep   0:00  0.00% cpq_mibs
 1140 root      44    0 4608K  663K sleep   0:00  0.00% xterm
  937 root      44    0 3248K  352K sleep   0:00  0.00% httpd
  959 root      44    0 2112K  327K sleep   0:00  0.00% ksh
  667 root      44    0 3776K  253K sleep   0:00  0.00% cron
  584 root      44    0 2592K  163K sleep   0:00  0.00% svrMgt_mib
  593 root      42    0 5936K 2457K sleep   0:00  0.00% insightd
  381 root      42    0 2320K  229K sleep   0:00  0.00% syslogd
  580 root      42    0 2608K  180K sleep   0:00  0.00% svrSystem_mib
  570 root      42    0 2088K  147K sleep   0:00  0.00% snmpd
  139 root      42    0 1728K  106K sleep   0:00  0.00% update
  539 root      32  -12 2272K  344K sleep   0:00  0.00% xntpd
How much could UBC be using:?
       UID    PID   PPID  CP PRI  NI   VSZ  RSS WCHAN    S    TTY
TI
ME COMMAND
         0      0      0   0  38  -6 2.54G  78M *        R <  ??
4:17.
79 [kernel idle]
         0      1      0   0  44   0  480K  96K pause    IL   ??
0:00.
52 /sbin/init -a
# bc
254*3
762
1807-1088
719
System Info:
# sizer -v
Compaq Tru64 UNIX V5.0A (Rev. 1094); Mon Apr 23 16:11:03 EST 2001
# psrinfo -v
Status of processor 0 as of: 04/23/01 17:40:17
  Processor has been on-line since 04/23/2001 16:13:07
  The alpha EV6 (21264) processor operates at 500 MHz,
        and has an alpha internal floating point processor.
Status of processor 1 as of: 04/23/01 17:40:17
  Processor has been on-line since 04/23/2001 16:13:07
  The alpha EV6 (21264) processor operates at 500 MHz,
        and has an alpha internal floating point processor.
# uname -a
OSF1 newton.itbsnmp.det.nsw.EDU.AU V5.0 1094 alpha
# scu show edt
CAM Equipment Device Table (EDT) Information:
    Bus/Target/Lun Device Type  ANSI  Vendor ID    Product ID    Revision
N/W
    -------------- ----------- ------ --------- ---------------- --------
---
     0    5    0   CD-ROM      SCSI-2 DEC       RRD47   (C) DEC    1206    N
     1    0    0   Direct      SCSI-2 DEC       RZ1DF-CB (C) DEC   0372    W
     1    2    0   Direct      SCSI-2 DEC       RZ1DF-CB (C) DEC   0372    W
     1    3    0   Direct      SCSI-2 DEC       RZ1DF-CB (C) DEC   0372    W
     1    5    0   Direct      SCSI-2 DEC       RZ1DF-CB (C) DEC   0372    W
     2    4    0   Sequential  SCSI-2 DEC       TZ89     (C) DEC   2150    W
     5    0    0   RAID        SCSI-2 DEC       HSG80CCL           V85F    W
     5    0    1   Direct      SCSI-2 DEC       HSG80              V85F    W
     5    1    0   RAID        SCSI-2 DEC       HSG80CCL           V85F    W
     5    1    2   Direct      SCSI-2 DEC       HSG80              V85F    W
     5    1    15  Direct      SCSI-2 DEC       HSG80              V85F    W
     5    126   0   Processor   SCSI-2 COMPAQ    KGPSA-CA           1.22
N
Guy R. Loucks
Senior Unix Systems Administrator
Networks Branch
NSW Department of Education & Training
Information Technology Bureau
Direct +61 2 9942 9887
Fax +61 2 9942 9600
Mobile +61 (0)429 041 186
Email guy.loucks_at_det.nsw.edu.au
Hi Guy,
I have consulted advise from my peers and here are some kernel parameters
that should be changed on your system based from our evaluation from your
sys_check.
- lower the ubc_maxpercent parameter to 50(%)
- lower the vm_ubcseqstartpercent to 35(%)
- increase vm_page_free_target to 256
- increase vm_page_free_swap	to 128
- lower vm_page_free_hardswap to 2048
- increase vm_page_free_min to 32
- increase vm_page_free_reserved to 20
- lower the vm_page_free_optimal to 256
- lower vm_page_prewrite_target to 256
Definitions for these parameters should be available in the sys_attrs_vm man
page. 
- application/octet-stream attachment: busy.out
- application/octet-stream attachment: NEWTON
 
Received on Mon Apr 23 2001 - 08:34:29 NZST