Hello managers!
We have a severe problem with one of our machines (ES40 True5.0, 1 Proc, 6 GB,
15GB swap). When a big job (~ 5 GB) starts to allocate memory, the machine
stucks and additional processes (e.g., a root login or a shutdown...) need
several minutes to come up.
Very suspicious is the output of vmstat, it shows massive page faults when
the process allocates its pages, free goes down to vm_page_free_reserved, then
activity suddenly drops to zero and the machine stucks:
Virtual Memory Statistics: (pagesize = 8192)
procs memory pages intr cpu
r w u act free wire fault cow zero react pin pout in sy cs us sy id
4 101 32 55K 675K 55K 227 0 227 6 0 0 3 4K 83 97 3 0
4 101 32 55K 675K 55K 127 0 127 0 0 0 2 4K 64 97 3 0
4 101 32 55K 675K 55K 156 0 156 0 0 0 2 4K 65 97 3 0
3 102 32 55K 675K 55K 0 0 0 0 0 0 2 40 65 100 0 0
4 101 32 100K 629K 55K 45332 1 37K 0 2 0 56 4K 171 27 68 4
4 101 32 157K 572K 55K 57241 62 56K 0 52 0 2 4K 86 8 92 0
4 101 32 217K 512K 55K 60210 0 61K 0 0 0 2 4K 65 9 91 0
3 102 32 278K 451K 55K 60959 0 57K 0 0 0 5 4K 74 9 91 0
4 101 32 339K 389K 56K 61773 0 62K 0 0 0 1 751 58 11 89 0
3 102 32 400K 328K 56K 61077 0 59K 0 0 0 2 4K 69 8 92 0
6 99 32 460K 268K 56K 60117 0 60K 0 0 0 2 4K 68 10 90 0
4 101 32 521K 207K 56K 61254 0 60K 0 0 0 2 4K 63 9 91 0
4 101 32 582K 146K 57K 60584 0 59K 0 0 0 2 4K 63 9 91 0
3 102 32 644K 84K 57K 62250 0 60K 21 0 0 4 86 70 8 92 0
4 101 32 703K 24K 57K 59499 0 59K 3 0 0 11 5K 93 9 91 0
2 102 33 727K 767 57K 23962 0 37K 8640 0 44 29 4K 380 3 60 36
2 101 34 727K 767 57K 4 0 3 0 1 0 12 38 133 0 3 97
2 99 36 727K 767 57K 0 0 0 0 0 0 1 746 113 2 3 95
top shows:
57 processes: 2 running, 20 sleeping, 34 idle, 1 zombie
CPU states: 0.8% user, 7.3% nice, 91.7% system, 0.0% idle
Memory: Real: 2464M/6014M act/tot Virtual: 15308M use/tot Free: 466M
PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
4930 xxxxxxx 63 19 5177M 5041M run 1:51 94.50% something
528 root 43 -1 5296K 1630K sleep 8:47 0.60% advfsd
-------------------------------------------------------------------
The logs show the following message several times:
vmunix: vm_swap I/O error during pageout
-------------------------------------------------------------------
Our swapspace is distributed over two disks and four partitions as
lazy swap: swapon -s (shortened)
Swap partition /dev/disk/dsk1b:
Allocated space: 262144 pages (2.00GB)
Swap partition /dev/disk/dsk0b:
Allocated space: 262144 pages (2.00GB)
Swap partition /dev/disk/dsk1h:
Allocated space: 848701 pages (6.48GB)
Swap partition /dev/disk/dsk0h:
Allocated space: 586557 pages (4.48GB)
Total swap allocation:
Allocated space: 1959546 pages (14.95GB)
-------------------------------------------------------------------
Our vm section settings in sysconfigtab are as follows:
vm:
swapdevice =
/dev/disk/dsk1b,/dev/disk/dsk0b,/dev/disk/dsk1h,/dev/disk/dsk0h
vm_segmentation = 1
vm_page_free_target=2048
vm_page_free_swap=1664
vm_page_free_hardswap=32768
vm_page_free_min=1024
vm_page_free_reserved=768
vm_page_free_optimal=1536
vm_swap_eager=0
vm_page_prewrite_target=4096
vm_rss_block_target=1536
vm_rss_wakeup_target=1536
vm_aggressive_swap=0
There are no severe incidents visible in the event viewer, no messages
at boot time (reboot does not help).
As the top output shows, no harddisk memory is used at the time the
machine stops working. So is this a physical memory problem ? Or is the
machine not able to allocate pages on the disks for some reason ?
We have exactly the same configuration on several other ES40, and they showed
up no problem when used up to the virtual memory limit.
Thanks for your help ! I'll summarize.
--
Dr. Udo Grabowski email: udo.grabowski_at_imk.fzk.de
Institut f. Meteorologie und Klimaforschung II, Forschungszentrum Karslruhe
Postfach 3640, D-76021 Karlsruhe, Germany Tel: (+49) 7247 82-6026
http://www.fzk.de/imk/imk2/ame/grabowski/ Fax: " -6141
Received on Tue Aug 15 2000 - 08:16:52 NZST