Hi all,
we encountered a recurrent problem on a DPW433au
with v5.0A + PK2. Has someone experimented
something similar ?
The problem is that after some days of running
(at very light load and almost no users), all
the monitoring tools (ps, top, collect) are not
able to read some per process data anymore like
CPU%, running state, priority, nice level,
waiting channel and time spent in CPU
(here attached is the output of "ps axl").
After a few hours the system always begins
to slow down progressively until when it must be
rebooted.
The "collect" data show that the slowdown is
associated with an increase to 100% in the 2
local disks I/O activity (the swap space spans
the 2 disks), but the data on free memory and
swap space don't show any oddity.
As another strangeness, 30 minutes before the
ps data become weird collect reports an increase
in the average CPU load from 0 to 1 during half
an hour, with the running queue length stepping
to 1 since then and never going down (on the
average).
The same collect data don't seem to show any process
responsible for that in the process list.
All the other host and network statistics, always
from collect, don't show variations, nor any messages
appear in the system logs.
Thank you in advance for the help
--
Stefano Cortese Virgo System Manager
INFN - Sezione di Pisa www.virgo.infn.it
VIRGO Project
Traversa H di via Macerata
Phone: +39-050-752.539 e-mail: cortese_at_virgo.infn.it
Fax : +39-050-752.550
Output of "ps axl"
UID PID PPID CP PRI NI VSZ RSS WCHAN S TTY TIME COMMAND
0 0 0 0 ? ? 162M 8.0M ? ??? ?? ?? [kernel idle]
0 1 0 0 ? ? 480K 32K ? ??? ?? ?? /sbin/init -a
0 3 1 0 ? ? 1.20M 0K ? ??? ?? ?? /sbin/kloadsrv
0 5 1 0 ? ? 2.31M 72K ? ??? ?? ?? /sbin/hotswapd
0 50 1 0 ? ? 1.69M 48K ? ??? ?? ?? /sbin/update
0 161 1 0 ? ? 2.75M 416K ? ??? ?? ?? /usr/sbin/evmd
0 194 161 0 ? ? 2.28M 64K ? ??? ?? ?? /usr/sbin/evmlogger -o /var/run/evmlogger.info -l /var/evm/adm/logfiles/evmlogger.log
0 195 161 0 ? ? 2.23M 72K ? ??? ?? ?? /usr/sbin/evmchmgr -l /var/evm/adm/logfiles/evmchmgr.log
0 281 1 0 ? ? 2.30M 88K ? ??? ?? ?? /usr/sbin/syslogd -e
0 285 1 0 ? ? 2.27M 160K ? ??? ?? ?? /usr/sbin/binlogd
0 355 1 0 ? ? 2.30M 0K ? ??? ?? ?? /usr/sbin/portmap
0 363 1 0 ? ? 1.11M 72K ? ??? ?? ?? /usr/sbin/ypbind -s -S [.. cut]
0 368 1 0 ? ? 2.73M 72K ? ??? ?? ?? /usr/sbin/mountd -i
0 370 1 0 ? ? 1.73M 0K ? ??? ?? ?? /usr/sbin/nfsd -t8 -u8
0 373 1 0 ? ? 1.69M 0K ? ??? ?? ?? /usr/sbin/nfsiod 7
0 376 1 0 ? ? 2.36M 0K ? ??? ?? ?? /usr/sbin/rpc.statd
0 379 1 0 ? ? 2.48M 0K ? ??? ?? ?? /usr/sbin/rpc.lockd
0 383 1 0 ? ? 2.54M 144K ? ??? ?? ?? /usr/sbin/automount
0 445 1 0 ? ? 2.22M 152K ? ??? ?? ?? /usr/sbin/xntpd -g -c /etc/ntp.conf
0 473 1 0 ? ? 2.55M 200K ? ??? ?? ?? sendmail: accept -bd -q15m -om
0 476 1 0 ? ? 2.02M 144K ? ??? ?? ?? /usr/sbin/snmpd
0 486 1 0 ? ? 2.55M 48K ? ??? ?? ?? /usr/sbin/svrSystem_mib
0 488 1 0 ? ? 2.54M 72K ? ??? ?? ?? /usr/sbin/svrMgt_mib
0 492 1 0 ? ? 3.55M 176K ? ??? ?? ?? /usr/sbin/os_mibs
0 503 1 0 ? ? 3.27M 112K ? ??? ?? ?? /var/opt/CPQIM222/bin/cpq_mibs
0 512 1 0 ? ? 2.48M 120K ? ??? ?? ?? /var/opt/CPQIM222/bin/cpqthresh_mib
0 528 1 0 ? ? 6.69M 368K ? ??? ?? ?? /var/opt/CPQIM222/bin/insightd
0 529 1 0 ? ? 3.73M 40K ? ??? ?? ?? bin/pmgrd
0 543 1 0 ? ? 3.56M 352K ? ??? ?? ?? /var/opt/CPQIM222/bin/config_hmmod
0 558 1 0 ? ? 5.02M 248K ? ??? ?? ?? /usr/sbin/advfsd
0 561 1 0 ? ? 3.56M 352K ? ??? ?? ?? /var/opt/CPQIM222/bin/sysman_hmmod
0 570 1 0 ? ? 1.80M 72K ? ??? ?? ?? /usr/sbin/inetd
0 581 1 0 ? ? 3.75M 160K ? ??? ?? ?? /usr/sbin/cron
0 617 1 0 ? ? 2.45M 0K ? ??? ?? ?? /usr/lbin/lpd
0 626 1 0 ? ? 4.50M 72K ? ??? ?? ?? /usr/bin/mmeserver -config /var/mme/system.ini
0 666 1 0 ? ? 4.46M 0K ? ??? ?? ?? /usr/dt/bin/dtlogin -daemon
0 709 1 0 ? ? 8.50M 880K ? ??? ?? ?? /usr/bin/../bin/alpha/native_threads/java -mx2m authentication/server/AuthenticationServer
0 742 1 0 ? ? 12.9M 184K ? ??? ?? ?? /usr/sbin/smsd -d
0 743 666 0 ? ? 8.58M 40K ? ??? ?? ?? /usr/bin/X11/X :0 -auth /var/dt/authdir/authfiles/A:0-aaakAa
0 749 1 0 ? ? 3.19M 0K ? ??? ?? ?? /usr/local/sbin/sshd
0 775 666 0 ? ? 4.46M 0K ? ??? ?? ?? dtlogin <:0> -daemon
228 843 1 0 ? ? 2.44M 0K ? ??? ?? ?? csh -fc source /virgoApp/Cm/v7r11/cmt/setup.csh ; $CMROOT/mgr/NameServer.start
228 1042 843 0 ? ? 2.17M 0K ? ??? ?? ?? /virgoApp/Cm/v7r11/mgr/NameServer.start
228 1088 1042 0 ? ? 3.12M 472K ? ??? ?? ?? NameServer.exe Cascina v7r11
228 1130 1 0 ? ? 2.44M 0K ? ??? ?? ?? csh -fc source /virgoApp/Db/v4r5p2/cmt/setup.csh ; $DBROOT/mgr/DbServer.start
0 1227 1 0 ? ? 6.27M 8K ? ??? ?? ?? /usr/bin/X11/dxconsole -geometry 1024x150-128-0 -daemon -nobuttons -verbose -notify -exitOnFail -nostdin -bg gray
0 1231 775 0 ? ? 8.88M 0K ? ??? ?? ?? dtgreet -display :0
228 1252 1 0 ? ? 2.45M 0K ? ??? ?? ?? csh -fc source /virgoApp/VMM/v1r4/cmt/setup.csh; $VMMROOT/mgr/VMMServer.start
228 1355 1130 0 ? ? 2.23M 0K ? ??? ?? ?? /virgoApp/Db/v4r5p2/mgr/DbServer.start
228 1398 1 0 ? ? 2.45M 0K ? ??? ?? ?? csh -fc source /virgoApp/VMC/v1/cmt/setup.csh; $VMCROOT/mgr/VMCServer.start
228 1475 1398 0 ? ? 2.23M 0K ? ??? ?? ?? /virgoApp/VMC/v1/mgr/VMCServer.start
228 1482 1252 0 ? ? 2.14M 0K ? ??? ?? ?? /virgoApp/VMM/v1r4/mgr/VMMServer.start
228 2111 1355 0 ? ? 4.84M 1.8M ? ??? ?? ?? DbServer.exe Cascina DbServerv4
228 2112 1475 0 ? ? 3.98M 176K ? ??? ?? ?? VMCServer.exe Cascina
228 3869 1 0 ? ? 2.45M 0K ? ??? ?? ?? csh -fc source /virgoApp/El/v4r4/cmt/setup.csh ; $ELROOT/mgr/ELServer.start
228 3938 3869 0 ? ? 2.19M 0K ? ??? ?? ?? /virgoApp/El/v4r4/mgr/ELServer.start
228 3941 3938 0 ? ? 3.11M 664K ? ??? ?? ?? ErrorLogger.exe Cascina /virgoData/El/logfiles
0 5236 570 0 ? ? 1.89M 0K ? ??? ?? ?? rlogind
228 269909 1482 0 ? ? 12.9M 0K ? ??? ?? ?? VMMServer.exe Cascina
0 314299 314305 0 ? ? 2.41M 792K ? ??? ?? ?? ps axl
251 314305 314306 0 ? ? 2.63M 368K ? ??? ?? ?? /users/[..cut]/bin/memo-ps
251 314306 581 0 ? ? 2.12M 192K ? ??? ?? ?? sh -c /users/[..cut]/bin/memo-ps
0 744 1 0 ? ? 464K 0K ? ???+ console ?? /usr/sbin/getty console console vt100
1047 5237 5236 0 ? ? 2.06M 0K ? ??? pts/1 ?? -ksh (ksh)
0 5262 5237 0 ? ? 2.17M 0K ? ??? pts/1 ?? sh
0 5268 5262 0 ? ? 2.06M 0K ? ???+ pts/1 ?? ksh
0 302043 1 0 ? ? 4.71M 632K ? ??? pts/1 ?? /usr/sbin/collect -i60,120 -f /var/adm/collect.dated/collect -H d0:5,1w -oz
Received on Wed Jul 25 2001 - 13:40:25 NZST