Hi all,
We see the current problem on our 2100 Servers:
(2CPU 512M RAM,default vm and proc parameters)
We have a process that is quite large (> 140MB) but uses a third party
product (Isis) which is time critical. In order to detect when problems occur
we have placed a thread in the process that monitors time, it
basically sleeps for 1 second and then checks the time that a thread
executed any code in Isis (there is a global we can check).
If the difference between this time and gettimeofday() is > 8 seconds
we print a warning, if it is > 40 seconds we core dump the application
so as to find out what threads are doing what.
The problem is that on one machine we see neither the 8 or 40 second
warning/error but the thread core dumps afer 60 seconds. polycenter
shows a great deal of pageing at this time (prior to the core dump)
and before this activity, the process had been quite idle (with this
and a couple of other house-keeping threads active from time to time)
for some time. We never observe this behaviour on Solaris 2.3 BTW.
One question regarding pageing - if a thread beings to access pages
that need to be paged in, does the threading system suspend ALL
threads irrespective of whether they require these pages?
A common attribute of all scenarios in which the time-warp problem occurs,
is that the system as a whole (ie all processes) have been doing very little
for the preceeding hour or so. Ie CPU utilisation, pageing, network i/o etc
has been almost 0. As soon as the system is asynchronously required to
schedule and run a heavyweight process, the time-warp scenario occurs.
Any help appreciated
Thanks
Juerg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Juerg Staub Working for Digital Equipment
Corp. AG on the:
SCHWEIZER BOERSE SWX
email: jst_at_atb.ch EBS PROJECT, GUI-Development
phone: +41 1 286 8281 Bleicherweg 10, 8021 Zurich
fax: +41 1 286 8496 SWITZERLAND
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Received on Thu Feb 15 1996 - 18:11:29 NZDT