Thanks to:
alan_at_nabeth
Allan E Johannesen
John P Speno
John J. Francini
Christopher K Davis
I got some good questions and suggestion from people on where to look
for the problem. They suggested that I use nfsstat; vmstat; and the
command "ps -p 0 -m -o wchan,state,time". I was advised to check
swapping, paging and forking
We realized that what was causing the high load may actually have been
the NFS mounted directory.
I got an informational post from John Francini:
> Pid 0, the nominal "kernel idle" process, also is where things like
> NFS servers and clients live, along with all the other kernel
> threads. Consequently, soaking up idle time is just one piece of the
> puzzle.
>
> To see what all is going on under PID 0, do
>
> ps -p 0 -m -o wchan,state,time
And also from alan_at_nabeth
> I believe the kernel idle "process" collects statistics
> for a whole bunch of kernel threads. It may also count
> all the system's idle time. A high load average is not
> indiciative of high CPU utilization and serial ps(1)
> listings are not a good way to look at overall CPU
> usage.
[...]
Thanks for all the help,
Kevin
[summary post]
> I've gotten some good suggestions so far, but most people are asking me
> questions, so I realize I left out some important details. The machine
> is a single processor, with about 7 Gigs of memory, running 4.0F
> unpatched. It is not really an NFS server, but it does export one
> directory to another Alpha, read only, via NFS. Although, I've checked
> the other machine which has 10 people logged on, and it dosen't seem
> like anyone is doing anything.
>
> Here is a trimmed off copy of top with the known process called
> some_compile.
>
> > load averages: 3.11, 2.56, 2.51 14:26:01
> > 72 processes: 2 running, 24 sleeping, 41 idle
> > Cpu states: 14.0% user, 0.0% nice, 82.4% system, 3.9% idle
> > Memory: Real: 4352M/7201M act/tot Virtual: 48M/20573M use/tot Free: 1194M
> >
> > PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
> > 22286 bob 42 0 2218M 2217M run 166:17 44.90% some_compile
> > 19011 root 44 0 2600K 393K sleep 1:01 0.30% top
>
> Notice that even though some_compile uses 44.9% CPU, the Cpu state is
> 3.9% idle. In fact, if you watch top, it is usually 0.0%. The load
> average is what is worrying me. I haven't seen this compile make the
> load go over 2.00, but it's into the 3's now.
>
> swapon -s shows it is not using any swap space. Here is the iostat
> output. Notice there is barely any disk transfers. Most of the i/o is
> in system mode CPU.
>
> > tty fd0 rz0 rz9 rz16 cpu
> > tin tout bps tps bps tps bps tps bps tps us ni sy id
> > 0 97 0 0 12 1 0 0 0 0 36 0 27 37
> > 0 348 0 0 0 0 0 0 0 0 15 0 84 1
> > 0 384 0 0 40 4 0 0 0 0 15 0 84 1
> > 2 456 0 0 0 0 0 0 0 0 19 0 81 0
> > 3 478 0 0 0 0 0 0 0 0 21 0 77 2
>
> I may be barking up the wrong tree with the [kernel idle], I just
> couldn't find anything else that made up for the CPU usage.
[original post]
> > Today, I noticed that our ES40 is running at a load of ~3 when it the
> > jobs it has shouldn't be pushing it much more than 1. I looked through
> > all the running processes and noticed that the [kernel idle] process is
> > taking up quite a bit of CPU resource. I checked the managers list
> > archive, and I only found questions when the %MEM is high, and nothing
> > for %CPU.
> >
> > # ps aux
> > USER PID %CPU %MEM VSZ RSS TTY S STARTED TIME
> > COMMAND
> > root 0 51.7 3.6 3.56G 261M ?? R < Jul 10 4-15:20:31
> > [kernel idle]
> >
> > I've checked our other Alphas and none of them have CPU usage nearly
> > like the ES40, even similarly loaded ones. Can someone shed some light
> > on this?
--
Kevin Dea
System Administrator
Alpine Electronics Research of America
Received on Wed Aug 30 2000 - 00:44:57 NZST