Hi Managers,
Paul Grant complaint about my "poor summary", all my apologises.
So I'll try to repair it so far I can but I didn't save all
the mails , the most of them said :
- turn swap mode from eager to lazi by removing /sbin/swapdefault
- use ps auxw or ps vax and see the column VSZ and RSS
- my swap space must be 3 or 4 times the memory size
George Guethlein wrote :
-----------------------
Jean, I had a somewhat similar problem that ended up being
I/O related. I did the following to see what was going on, and to
fix it :
1.) "uerf -o brief -t s: 00:00:00" to look for any recent system
errors. Hopefully, there will be none.
2.) "dbx /vmunix", looking for (what you want to see is in parens) :
vpf-allocated pages (=3D0)
vpf-wiredpages (=3D0)
vpf-borrowedpages (=3D0)
vpf-freedpages (=3D0)
3.) "swapon -s" to see if something is swapped out. I also used
"ps -je". Column 7 of the output indicates the process state, and
allows for 3 characters. If the second character is a "W", then the
process is swapped out. This could be the offending program.
4.) "vmstat 5 10" to see the paging info (and memory usage ??).
The "wired" column indicates in-core memory used by vmunix.
5.) "sysconfig -q vm", looking for (what you DON'T want to see):
ubc-maxpercent (=3D100)
ubc-borrowpercent (=3D20)
The ubc_maxpercent value indicates the amount of memory that
can be used by the I/O subsystem (100 =3D ALL OF IT). If one
process uses all of the memory, it becomes "wired" and forces
the other processes to swap out. Eventually, no process can run
and the system hangs.
MY RESOLUTION:
I modified /etc/sysconfigtab to contain the following subsystem
override :
vm:
ubc-maxpercent =3D 50
This limited the I/O subsystem to only use 50% of the physical
memory (NOT ALL OF IT), and cycle through through it's buffers
faster. I then killed the process and cycled the system. Later, I
even decreased the override to 10%.
I hope I didn't provide too nmuch useless info. Best of luck.
Alan Rollow - Dr. File System's Home for Wayward Inodes wrote :
---------------------------------------------------------------
re: finding the program.
Maybe not easily. If there happens to be enough page/swap
space for the program's virtual memory requirements, then
it will probably still be running when the message is
printed. In this case you just need to look at the ps(1)
listing for large virtual memory processes (or lots of
processes that are using lots of virtual memory when
combined).
If the process died as the result of the failure, you'd
have to ask your users if they had a process die as the
result. If it handled the failure gracefully, there might
not anything more than an error message. If the process
wasn't started interactively, the message may be hidden
in a log file. The worst case is that process did handle
the failure and found another way to get the virtual memory
it needed, or found a way to do without.
One potentially useful failure is that processes will sometimes
get a segmentation violation after running out of memory. These
will leave "core" files. You could make a pass of the system
to locate all the core files (find / -name core -print) see if
the dates of any match the time of the message, clean them up
and when the next message happens, see if one showed up.
Theis Jean-Marie wrote :
------------------------
Your swap space is too small regarding your memory , I think it should
be at least 3 times bigger.
If you know "top" which is a public domain able to show you the biggest
CPU eaters , I have made a script on the same principle which shows
you continuously a sorted list of the biggests swap-eaters by sorting
their virtual sizes (VSZ).
You should if possible have "swapstat" installed to have on top of list
the global summary of swap used .If not ; just comment the line
containing "swapstat".
I called it "pot".
I attach pot and also its man : pot.1
---------------------------------------------------------------------------
jean.schuller_at_ires.in2p3.fr _/ _/_/_/ _/_/_/ _/_/_/_/
_/ _/ -/ _/ _/ _/ =20
_/ _/_/_/-/ _/_/_/ _/_/_/_/
_/ _/ -/ _/ _/
_/ _/ _/ _/_/_/ _/_/_/_/ =20
local call: 0388106630 Institut de Recherches Subatomiques
foreign call: (33)388106630 Bo=EEte Postale 28=20
local fax : 0388106234 23, Rue du Loess
foreign fax : (33)388106234 F-67037 STRASBOURG CEDEX - France
---------------------------------------------------------------------------
.\" This manpage source uses rsml coding.
.so /usr/lib/tmac/sml
.so /usr/lib/tmac/rsml
.\"
.TH pot 1
.SH NAME
.PP
\*Lpot \*O \- Montre une liste triee des plus gros consommateurs de swap
.SH SYNOPSIS
.PP
.sS
\*Lpot \*O
\*O[\*Lhelp\*O\&] \*O[\*Lall\*O\&] \*O[\*Ls\*Vsecondes\*O] \*O[\*Ln\*O\&] \*O[\*L-n\*O\&]
.sE
.PP
La commande \*Lpot\*O utilise \*Lswapstat\*O et \*Lps\*O et montre a intervalles
reguliers la liste triee par ordre decroissant des plus gros
consommateurs d'espace de swap.C'est un (pale) remake de \*Ltop\*O mais
oriente memoire virtuelle
.LE
.PP
.SH DESCRIPTION
.PP
.iX "top" "swap" "memory" "virtual" "process"
Par defaut, \*Lpot\*O montre 10 processes avec un intervalle de 10 secondes.
.SH FLAGS
.PP
.VL 6m
.LI "\*Lhelp\*O"
Montre le help.
.LI "\*Lall\*O"
Montre aussi les processes appartenant a root.
.LI "\*Ln\*O"
Montre n lignes : la valeur par defaut etant 10.
.LI "\*L0\*O"
Montre un nombre illimite de lignes.
.LI "\*L-n\*O"
Montre une seule fois n lignes puis s'arrete.
.LI "\*L-0\*O"
Montre une seule fois un nombre illimite de lignes puis s'arrete.
.LI "\*Ls n\*O"
Attends n secondes entre chaque display par defaut : 10 secondes.
.SH EXEMPLES
.PP
.AL
.LI
Pour montrer le help:
.iS
pot help
\*O\c
.iE
.IP
.LI
Pour montrer 20 processes excepte root a un intervalle de 30 secondes:
.iS
pot 20 s30
\*O\c
.iE
.IP
.LI
Pour montrer 12 processes y compris root a intervalle 10 sec par defaut:
.iS
pot 12 all
\*O\c
.iE
.IP
.LI
Pour montrer tous les processes sur un nombre illimite de lignes
et egalement ceux appartenant a root
.iS
pot 0 all
\*O\c
.iE
.IP
.LI
Pour ne montrer q'une seule fois tous les processes y compris root:
.iS
pot -0 all
\*O\c
.iE
.IP
.SH REMARQUE IMPORTANTE
.PP
Ce qui est montre dans la colonne \*Lswap\*O est ce qui est appele
\*LVSZ\*O dans la commande \*Lps\*O (ps aux).
Cette valeur est en general plus grande que ce qui est effectivement
consomme en swap pour les raisons suivantes:
-Dans certains cas peuvent etre comptees dans cette taille
des zones de datas residant sur disque ou non encore
chargees dans le swap.
-La taille de l'executable proprement dit est comptee mais
il n'est pas forcement entierement charge dans le swap
pendant son execution.
.PP
Toutefois les valeurs relatives montrees par cet utilitaire peuvent
aider utilement a optimiser les performances des systemes.
.SH INFORMATIONS ASSOCIEES
.PP
Commands: \*Lswapstat\*O, \*Lps\*O(1), \*Lsed\*O(1).
.SH Contacts pour modifs bugs etc...
Theis Jean-Marie e-Mail theis_at_drfc .
.EQ
.EN
#!/bin/sh
#
Commande="ps Aax -o pid,user,lstart,vsize,ucomm | egrep -v ' STARTED | root | - ' "
SleepCommand="sleep 10"
RefreshCommand='echo [2J [0H'
FinalCommande="head -10"
SleepTime="10"
SmallName=`basename $0`
#
while [ "$1" ]
do
case "$1" in
h|he|help|n|ai*)echo "$0 est un remake de top mais pour la consommation de swap
Syntaxe : $SmallName [ help | all ] [ n | -n | 0 | -s secondes ]
Arguments :
help : donne ce help
s n : n argument numerique : attends n secondes entre chaque display
defaut 10 secondes
all : montre tous les processes y compris root
si \"all\" est absent seuls les non-root apparaissent
n : Argument numerique : montre n lignes et boucle defaut = 10
si cet argument est absent alors 10 lignes sont montrees
si n=0 montre le max de lignes trouvees
-n : ne marche q'une seule fois et montre n lignes
si n=0 montre le max de lignes trouvees
Exemples:
$SmallName Liste triee decroissante des 10 gros mangeurs de swap sauf root
refresh toutes les 10 secondes : arret par control/c.
$SmallName all Liste triee decroissante des 10 gros mangeurs de swap inclus root
refresh toutes les 10 secondes : arret par control/c.
$SmallName 20 all Liste triee des 20 plus gros mangeurs de swap inclus root
refresh toutes les 10 secondes : arret par control/c.
$SmallName s10 7 Liste triee des 7 plus gros mangeurs de swap
refresh toutes les 10 secondes
$SmallName -20 Liste une seule fois les 20 plus gros mangeurs de swap
"
exit ;;
a|-a|all|-all)
Commande="ps Aax -o pid,user,lstart,vsize,ucomm | egrep -v ' STARTED | - ' "
shift;;
[1-9]*)
RefreshCommand='echo [2J [0H'
FinalCommande="head -$1"
shift;;
0)RefreshCommand='echo [2J [0H'
FinalCommande="cat"
shift;;
-[1-9]*)
SleepCommand='exit'
FinalCommande=" head $1"
RefreshCommand='echo'
shift;;
-0)
SleepCommand='exit'
FinalCommande=" cat"
RefreshCommand='echo'
shift;;
s*)SleepTime=`echo $1 | sed 's/s//'`
shift
if [ "$SleepTime" = "" ]
then
SleepTime=$1
shift
fi
SleepCommand="sleep $SleepTime"
RefreshCommand='echo [2J [0H';;
*)
Commande="ps Aax -o pid,user,lstart,vsize,ucomm | egrep -v ' STARTED | root | - ' "
shift;;
esac
done
eval $RefreshCommand
while true
do
#Comment the following line if you do not have swapstat
#swapstat
echo ""
echo "PID USERNAME Starting date of process Swap Command"
eval $Commande |
sed -e '
/[0-9]K /{
/ [0-9]K /{
s/[0-9]K/0.00&/
s/K /M /
}
/ [0-9][0-9]K /{
s/[0-9][0-9]K/0.0&/
s/K /M /
}
/ [0-9][0-9][0-9]K /{
s/[0-9][0-9][0-9]K/0.&/
s/K /M /
}
}' | sort -n -r -k 8 | $FinalCommande
eval $SleepCommand
eval $RefreshCommand
done
Received on Tue Nov 18 1997 - 15:34:29 NZDT