Sincere thanks to Dr Tom Blinn and Andy Cohen for
their help with
this issue .
The Problem:
All of a sudden the /usr file system on a ES40 with
tru64 4.0F
and patchkit 7 , reached 101% and kept rising ,the
server was crawling
and apps slowed down .
An initial investigation revealed that a dt (desktop)
process gone
haywire was logging errors to its errorlog file in
the .dt subirectry.
Nulling this file did not help , and only on killing
the CDE desktop
did the logging stop.
I reproduce Dr Blinns suggestions as below :
The "dt" subsystem is used extensively in CDE. The
fact that
the log was in a single user's directory (at least,
that's what
I *think* you wrote) says that something in that one
user's CDE
context was going wrong. "select" is a standard C
library call,
and is used in "polling" for I/O operations so that a
program
can manage multiple I/O streams without blocking,
e.g., deal
with multiple network links. Error code 22 is
#define EINVAL 22 /* Invalid
argument */
which suggests that some part of the "dt" ("dt"
stands for, if I
remember correctly, "desktop", as in "Common Desktop
Environment")
subsystem got into trouble, and wound up in an error
loop where it
kept trying to call "select" with a bad argument
and failed to see
that it was getting an error code from which it
never recovered.
This is an ugly bug, but it's a bug, and if you
have a support
contract, you should report it.
There were LOTS of processes running on the system,
you just did
not know about most of them. If there are users
logged in, they
can have MANY processes, most of which are sitting
idle most of
the time, and with CDE, the "dt" subsystem is always
there.
There is probably no way to disable the logging.
It's there so
that you can find and fix such errors. Of course,
when it goes
wrong, it can be a problem in itself. If you
managed to remove
the log file and replace it with a symlink to
/dev/null or with
a directory instead of a simple file (as just two
examples), it
is likely that whatever CDE component is trying to
append error
messages to the log file would either append to
/dev/null (with
the symlink) or fail completely (with the
directory). But this
is not really a good idea. Another approach is to
move the user
directories off of /usr to a file system that's
less likely to
cause operational problems if it fills up
completely. That's a
good idea in general.
Tom
Andy Cohen said :
you might be able to user 'fuser' to determine what
process is writing to that file .
Other associates suggested , this situation can
arise when the system
date/time is changed in multiuser mode , on a CDE
system .
The sysadmin informs me that he did indeed change the
time on this
system , and also on 2 more DS20E's which had CDE
running , so it could
be a random bug .
sincere regards & thanks
Dominic
__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
Received on Sat Jun 07 2003 - 06:13:55 NZST