SUMMARY: High number of threads in vmstat w column

From: <dave.campbell_at_vf.vodafone.co.uk>
Date: Mon, 25 Jun 2001 06:39:45 +0100

Nothing concrete here, but thanks to Dr. Tom and Alan Nabeth for their
replies. After digging around a bit more I did find as Tom suggested a lot
of processes in the I or S state which did account for the high number. I
guess in this instance this was a bit of red herring and there was no real
problem to start with. The question was related to an perceived "slow
system" problem from an application point of view. Although I couldn't
really find anything wrong with the system, this just came up as something
to follow up on. The developers who have "inherited" this application from
an external software house believe the problem may be due to the application
itself and they're looking for some evidence. All I am looking to supply it
that is not a bottleneck down at the O/S.

Alan did suggest that there may be a lot of forking activity due to the zfod
faults and page-ins. I found another summary that mentioned this and it was
suggested that the application itself was forking off more threads than it
required - possibly some more evidence that the problem lies with the
application.

Thanks,
Dave.

Original question and replies posted below......

I wrote:

This may or may not be a problem, but I have a GS140 system with 12 cpus and
8Gb of memory running V4.0f that appears to have a very large number of
threads in the waiting state - see vmstat output below. I can find no
apparent problem or bottleneck, i/o waits etc, but I am somewhat unsure as
to why I should have so many threads waiting. Can any throw any light on why
this number might be so large and what I can potential look at to identify
what all these threads are and what they are waiting for. I've looked for
summaries on this subject but have so far found nothing.

Thanks,
Dave Campbell
(dave.campbell_at_vf.vodafone.co.uk)

Virtual Memory Statistics: (pagesize = 8192)
  procs memory pages intr cpu

  r w u act free wire fault cow zero react pin pout in sy cs us sy
id
 14 398 70 355K 606K 74K 9750M 2273M 3407M 18K 2197M 0 2K 33K 12K 10
8 83
 13 399 70 355K 605K 74K 1832 239 958 0 751 0 2K 18K 10K 4 4
92
 14 398 70 356K 605K 74K 586 44 494 0 60 0 2K 34K 11K 5 4
91
 14 398 70 356K 605K 74K 1160 243 561 0 253 0 2K 24K 10K 4 3
93
 14 398 70 356K 604K 74K 1206 95 1000 0 97 0 2K 36K 11K 5 7
88
 13 399 70 356K 604K 74K 209 1 206 0 3 0 2K 22K 11K 5 4
91
 14 398 70 357K 604K 74K 2115 401 820 0 892 0 2K 26K 10K 4 3
92
 14 400 70 357K 604K 74K 464 44 367 0 70 0 2K 35K 10K 5 3
92
 14 398 70 357K 604K 74K 545 34 475 0 45 0 2K 32K 10K 4 3
92
 14 398 70 357K 603K 74K 1073 219 517 0 223 0 2K 28K 11K 5 3
92
 14 398 70 357K 603K 74K 1448 214 645 0 721 0 2K 28K 11K 5 4
92
 14 398 70 357K 603K 74K 419 1 409 0 3 0 2K 28K 10K 5 3
91
 14 398 70 357K 603K 74K 203 0 203 0 0 0 2K 14K 10K 4 3
92
 14 398 70 358K 602K 74K 1274 222 720 0 223 0 2K 24K 10K 5 3
92
 15 395 72 358K 602K 74K 203 0 203 0 0 0 2K 19K 11K 5 3
92

Dr. Tom wrote:

I like "ps aux" (others like "ps -ef" which shows much the same data).

If you look in the "S" column (process state) you can see which processes
are in a wait state; strictly speaking, what I'm looking at is not at the
threads level, but most processes have a single thread anyway. You can
read the "ps" man page to learn more about what it's reporting. I bet
you will find a LOT of processes in the "I" or "S" state; for instance,
processes waiting for human or network input are often in this state. I
also often see "U" for processes blocked uninterruptibly in kernel code.
Of course, the "R" processes are runnable (if not actually running). As
you have already figured out, it's perfectly normal to have LOTS of your
processes in a non-runnable state (all that means is they are waiting
for something to occur, usually I/O completion but sometimes a timed
wait).

Alan Nabeth wrote:

        The three choices that come to mind:

        o There isn't anything for them to do and their idle state
           appears as a "wait".

        o Insufficient resources to keep all the threads busy. You
           only have 12 CPUs and lots of threads that may wanting to
           be doing something.

        o From just this vmstat(1) listing you can't tell whether
           the idle time is real idle time or idle time in an I/O
           wait (really a variant of the 2nd point). At least on
           V4.0D, idle time is where vmstat(1) counts I/O wait time.

        Your steady page fault, zero fills and page-ins suggests
        a lot of fork activity. The two could be related.
Received on Mon Jun 25 2001 - 05:44:59 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT