Managers,
Some interesting info we have been having in regards to UBC, thoughts are
most welcome, below is most of a note I have sent to compaq CSC.
We now believe the problem to be a more generic UBC issue.
What is happening with High Disk I/O (Advfs) to local and SAN disks the
memory for UBC gets fully consumed. On a default system this is set
ridiculously high (100%) so your machine promptly starts to thrash when you
instantiate a few processes.
The allocation of memory is not an issue, the problem is it is not being
relinquished. We should not be hard paging, and we are extensively.
Previously we were using lazy swap, which resulted in init killing processes
arbitrarily, and migrating to eager swap just prevented processes from
instantiating. The limp around was to throw more swap at the problem. This
however was just masking the problem.
We have now cranked back the UBCMAX%, this seems to have prevented us from
hitting the floor so hard.
So why is UBC behaving so differently under DUX 5? If you run the simple
process below you will see the process chew up UBC. When the system quiets
down, the UBC will be slowly returned. However run 100 or 1000 copies, and
then UBC is never returned. Even if leaving the system idle for days after
the processes are terminated. Umount/Mount the FS, and memory is returned.
Thoughts?
# cat /work/shuntit
#!/usr/bin/ksh
count=${1:-1000}
finish()
{
exit 0
}
trap finish 1
if [ ! -f M-1 ]
then
dd if=/dev/zero of=./M-1 bs=10240 count=${count}
fi
while true
do
dd if=./M-1 of=M-2 bs=10240 count=${count}
dd if=./M-2 of=M-1 bs=10240 count=${count}
done
System Busy, close to hitting floor, why are we swapping, see ps lax, UBC
not relinquishing memory:
load averages: 5.18, 5.25, 5.02
18:05:13
100 processes: 9 running, 23 sleeping, 60 idle, 8 zombie
CPU states: 0.1% user, 74.4% nice, 20.2% system, 5.1% idle
Memory: Real: 975M/1993M act/tot Virtual: 29M/10211M use/tot Free: 1920K
PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
11710 cricket 42 2 41M 36M run 0:02 32.20% collector
11492 cricket 42 2 130M 116M run 0:32 25.60% collector
11708 cricket 42 2 45M 39M run 0:03 25.40% collector
11490 cricket 42 2 130M 116M run 0:31 16.70% collector
11175 cricket 42 2 132M 114M run 0:43 11.80% collector
10476 cricket 42 2 132M 114M sleep 1:58 10.90% collector
11486 cricket 42 2 130M 116M sleep 0:21 10.90% collector
10690 cricket 42 2 132M 114M sleep 1:27 10.70% collector
11322 cricket 42 2 130M 113M run 0:34 10.40% collector
11380 cricket 42 2 130M 114M sleep 0:30 9.50% collector
11320 cricket 42 2 132M 113M run 0:32 7.60% collector
1429 root 44 0 8872K 6062K sleep 1:32 3.40% top
11715 root 44 0 8872K 6062K run 0:00 0.40% top
11716 cricket 46 2 132M 4915K run 0:00 0.00% collector
667 root 44 0 3776K 245K sleep 0:01 0.00% cron
System at start:
load averages: 0.00, 0.03, 0.09
16:15:22
48 processes: 1 running, 11 sleeping, 36 idle
CPU states: 0.0% user, 0.0% nice, 0.7% system, 99.2% idle
Memory: Real: 13M/1993M act/tot Virtual: 10211M use/tot Free: 1775M
PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
1052 root 44 0 8872K 6062K run 0:00 0.40% top
250 root 44 0 2864K 655K sleep 0:00 0.00% evmd
959 root 44 0 2112K 327K sleep 0:00 0.00% ksh
279 root 44 0 2336K 319K sleep 0:00 0.00% evmlogger
381 root 44 0 2320K 245K sleep 0:00 0.00% syslogd
588 root 44 0 3552K 204K sleep 0:00 0.00% os_mibs
584 root 44 0 2592K 172K sleep 0:00 0.00% svrMgt_mib
570 root 44 0 2088K 155K sleep 0:00 0.00% snmpd
593 root 42 0 5936K 2588K sleep 0:00 0.00% insightd
582 root 42 0 3544K 1024K sleep 0:00 0.00% cpq_mibs
937 root 42 0 3248K 466K sleep 0:00 0.00% httpd
539 root 32 -12 2272K 360K sleep 0:00 0.00% xntpd
System Quiet (SAN Still Mounted)
load averages: 0.17, 0.12, 0.14
17:24:11
65 processes: 1 running, 23 sleeping, 41 idle
CPU states: 0.4% user, 0.0% nice, 3.7% system, 95.8% idle
Memory: Real: 608M/1993M act/tot Virtual: 10211M use/tot Free: 1088M
PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
1429 root 44 0 8872K 6062K sleep 0:56 2.70% top
7936 root 44 0 8872K 6062K run 0:00 0.40% top
7867 root 42 0 5048K 999K sleep 0:00 0.40% rasusers
582 root 44 0 3544K 983K sleep 0:00 0.00% cpq_mibs
1140 root 44 0 4608K 663K sleep 0:00 0.00% xterm
7989 root 44 0 2000K 385K sleep 0:00 0.00% grep
937 root 44 0 3248K 352K sleep 0:00 0.00% httpd
959 root 44 0 2112K 319K sleep 0:00 0.00% ksh
667 root 44 0 3776K 253K sleep 0:00 0.00% cron
7987 root 44 0 1792K 221K sleep 0:00 0.00% rsh
7990 root 44 0 1792K 221K sleep 0:00 0.00% rsh
7865 root 44 0 2224K 204K sleep 0:00 0.00% sh
7624 cricket 44 0 1744K 163K sleep 0:00 0.00% tail
584 root 44 0 2592K 163K sleep 0:00 0.00% svrMgt_mib
1 root 44 0 480K 98K sleep 0:00 0.00% init
System Quiet, Busy SAN Disk Unmounted:
load averages: 0.16, 0.08, 0.07
17:35:09
57 processes: 1 running, 14 sleeping, 42 idle
CPU states: 0.0% user, 0.0% nice, 1.4% system, 98.4% idle
Memory: Real: 16M/1993M act/tot Virtual: 10211M use/tot Free: 1807M
PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
1429 root 44 0 8872K 6062K sleep 1:06 2.40% top
8882 root 44 0 8872K 6062K run 0:00 0.30% top
1171 root 44 0 12M 2179K sleep 0:00 0.00% dxterm
582 root 44 0 3544K 983K sleep 0:00 0.00% cpq_mibs
1140 root 44 0 4608K 663K sleep 0:00 0.00% xterm
937 root 44 0 3248K 352K sleep 0:00 0.00% httpd
959 root 44 0 2112K 327K sleep 0:00 0.00% ksh
667 root 44 0 3776K 253K sleep 0:00 0.00% cron
584 root 44 0 2592K 163K sleep 0:00 0.00% svrMgt_mib
593 root 42 0 5936K 2457K sleep 0:00 0.00% insightd
381 root 42 0 2320K 229K sleep 0:00 0.00% syslogd
580 root 42 0 2608K 180K sleep 0:00 0.00% svrSystem_mib
570 root 42 0 2088K 147K sleep 0:00 0.00% snmpd
139 root 42 0 1728K 106K sleep 0:00 0.00% update
539 root 32 -12 2272K 344K sleep 0:00 0.00% xntpd
How much could UBC be using:?
UID PID PPID CP PRI NI VSZ RSS WCHAN S TTY
TI
ME COMMAND
0 0 0 0 38 -6 2.54G 78M * R < ??
4:17.
79 [kernel idle]
0 1 0 0 44 0 480K 96K pause IL ??
0:00.
52 /sbin/init -a
# bc
254*3
762
1807-1088
719
System Info:
# sizer -v
Compaq Tru64 UNIX V5.0A (Rev. 1094); Mon Apr 23 16:11:03 EST 2001
# psrinfo -v
Status of processor 0 as of: 04/23/01 17:40:17
Processor has been on-line since 04/23/2001 16:13:07
The alpha EV6 (21264) processor operates at 500 MHz,
and has an alpha internal floating point processor.
Status of processor 1 as of: 04/23/01 17:40:17
Processor has been on-line since 04/23/2001 16:13:07
The alpha EV6 (21264) processor operates at 500 MHz,
and has an alpha internal floating point processor.
# uname -a
OSF1 newton.itbsnmp.det.nsw.EDU.AU V5.0 1094 alpha
# scu show edt
CAM Equipment Device Table (EDT) Information:
Bus/Target/Lun Device Type ANSI Vendor ID Product ID Revision
N/W
-------------- ----------- ------ --------- ---------------- --------
---
0 5 0 CD-ROM SCSI-2 DEC RRD47 (C) DEC 1206 N
1 0 0 Direct SCSI-2 DEC RZ1DF-CB (C) DEC 0372 W
1 2 0 Direct SCSI-2 DEC RZ1DF-CB (C) DEC 0372 W
1 3 0 Direct SCSI-2 DEC RZ1DF-CB (C) DEC 0372 W
1 5 0 Direct SCSI-2 DEC RZ1DF-CB (C) DEC 0372 W
2 4 0 Sequential SCSI-2 DEC TZ89 (C) DEC 2150 W
5 0 0 RAID SCSI-2 DEC HSG80CCL V85F W
5 0 1 Direct SCSI-2 DEC HSG80 V85F W
5 1 0 RAID SCSI-2 DEC HSG80CCL V85F W
5 1 2 Direct SCSI-2 DEC HSG80 V85F W
5 1 15 Direct SCSI-2 DEC HSG80 V85F W
5 126 0 Processor SCSI-2 COMPAQ KGPSA-CA 1.22
N
Guy R. Loucks
Senior Unix Systems Administrator
Networks Branch
NSW Department of Education & Training
Information Technology Bureau
Direct +61 2 9942 9887
Fax +61 2 9942 9600
Mobile +61 (0)429 041 186
Email guy.loucks_at_det.nsw.edu.au
Hi Guy,
I have consulted advise from my peers and here are some kernel parameters
that should be changed on your system based from our evaluation from your
sys_check.
- lower the ubc_maxpercent parameter to 50(%)
- lower the vm_ubcseqstartpercent to 35(%)
- increase vm_page_free_target to 256
- increase vm_page_free_swap to 128
- lower vm_page_free_hardswap to 2048
- increase vm_page_free_min to 32
- increase vm_page_free_reserved to 20
- lower the vm_page_free_optimal to 256
- lower vm_page_prewrite_target to 256
Definitions for these parameters should be available in the sys_attrs_vm man
page.
- application/octet-stream attachment: busy.out
- application/octet-stream attachment: NEWTON
Received on Mon Apr 23 2001 - 08:34:29 NZST