-- "[kernel idle]" is a catch-all in the Tru64 UNIX kernel. It gets all of the internal overhead kernel threads, including things like sync-ing the disks, environmental monitoring, some aspects of disk I/O, some memory management overhead, and so on. Basically, it's a "catch all" for the things the kernel is doing on behalf of the system as a whole that can't be blamed on any specific user "job" or process. (If the kernel could clearly identify a specific user process as "responsible" for the kernel work, it would charge the CPU time to that process.) Some of this CPU work may be due to things like SCSI I/O interfaces, things like interrupt handling are NOT usually attributable to any specific user process, so in most cases, they are charged off to the "[kernel idle]" bucket. In addition, after the kernel has done ALL the work that's been thrown at it by the hardware and the users, if there is any CPU time left over (real CPU idle time), that gets charged into the "[kernel idle]" bucket. -- Now, why is this system performing poorly with high "[kernel idle]"? Most probably, it's due to application design or implementation, or poor choice of things like I/O hardware. It sounds like you have a LOT of "general system overhead" in this system. Depending on the hardware configuration, this could be due to things like the choice of SCSI adapters (some do more of the work in the adapter itself and have a really simple kernel interface, some require the Alpha CPU to do a lot of work to service I/Os), the network interfaces, and the like, and some may be due to the way the applications use some of the interfaces in the system. Things like selecting the next thread to run when application threads block (scheduling) are charged to the "[kernel idle]" bucket, for example, so in a system where there is a LOT of process context switching there will be high "[kernel idle]" reported. -- Bottom line: It sure sounds like you have system performance problem, and the high "[kernel idle]" time reported is a symptom, but it's not likely to be the cause; you just have to trust me that if the kernel could assign the available CPUs to doing application work, it would do so. If the CPUs seem idle when there is application work to be done, there is some bottleneck in the system that's causing this, but the high "[kernel idle]" is just a symptom, not a cause. Tom ==================================================================================================== USEFULL HINTS AFTER DISCUSION ON THE SUBJECT. (A) It is usefull to take a look on the threads that they are occupying most of the CPU time. Using ps -Am -O THREAD -p <kernel_idle_PID> a list of THREADS is listed with appropriate information. (B) Most of the managers have experienced this problem when I/O bottleneck was occuring. For discovering I/O bottlenecks and hints I got the following reply from Michael James Bradford: "My guess is that the problem lies with your disk I/O. If a process is waiting for data from the disks and therefore is unable to do anything, then it will sleep and the CPUs be idle. To analyze your machines performance, run "collect". You can either run it "live" or with output to a file (the latter is probably best). For the disks, look for high AVS, AVW, ACTQ and WTQ (explanations of these can be found in the collect man page). Depending on the values of these, try to spread the load over more spindles (disks) or more controllers. Try analyzing what Oracle is doing as well as you could have inefficient sql calls." (C) Also for seeing the iowait of CPUs vmstat -w should be used as a first step. Kind Regards, Aristotle Sdrolias. > -----Original Message----- > From: Sdrolias Aristotelis > Sent: Wednesday, July 28, 2004 6:31 PM > To: 'tru64-unix-managers_at_ornl.gov' > Subject: kernel idle %CPU high > > Hi all, > > OS: tru64 5.1 > > Main question is the following: > Is it possible high %CPU time of [kernel idle] process to slow down the execution of other processes and generally result to poor performance of the system? > > Situation: > We are experiensing here poor performance on some processes which are more sleeping than executing on CPUs when the number of CPUs are enough (and 70% idle), and memory is enough. Processes are connecting, retreiving data from oracle, computing and writing back to oracle and filesystem. > So, there is a possibility that oracle or filesystem might be the bottleneck. > But appart from this, is it possible high %CPU time of [kernel idle] process to slow down the execution of processes? > [kernel idle] process has on our system %CPU time 65% on average. > Is there a possibility that this is causing the problem?. If yes, how can I find what is the exact cause of the problem? > I have seen many tru64 system and this is the only one with such a big CPU time on [kernel idle] process. > What might be causing this behaviour? any ideas? > > > Kind regards, > Aristotle Sdrolias > Software & system Engineer. > Email: asdrolias_at_cosmote.gr > >Received on Wed Sep 08 2004 - 13:51:40 NZST
This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:44 NZDT