Greetings -
Background:
Yesterday we added a 4GB memory board to our 8400, and are currently
interleaving the 4GB board with two 2GB boards, and a third 2GB board
is not interleaved. We now have 10GB RAM to feed four 440 MHz processors
running DU v3.2G (fully patched). 2700+ processes (>500 simultaneous users).
Primary CPU is 0% idle 90+% kernel mode; CPUs 1, 2, and 3 are averaging
10-20% idle. I realize we're I/O bound at peaks until we get to DU v4.0...
We are running AdvFS (HSZ50s, hardware RAID-5, 10, 20, and 45 GB RAIDsets).
Problem:
Every 30 seconds on the dot, for the past couple of hours, since we got
the system busy with users (all using Oracle), I see a CPU drain for
about 10 seconds. The system is *not* running any cron, at, batch, etc...
or any user programs that could account for this. The accuracy of every
30 seconds leads me to believe it is some VM (RAM, UBC, etc..) scan that
the Operating system is doing, or some device flush.
The problem (of this magnitude) started today - the first day after
getting the system up to 10GB of RAM. We are using over 1GB of
swap space at this time, down from 5-7GB during peak times last week.
The only way I am able to identify this CPU drain is with some custom
code of mine which does less than 30 CPU tics of work on the 8400.
The code displays the elapsed number of tics between 0.5 second select()
iterations, and every 30 seconds on the button, that 0.5 second select()
timeout takes between 10 and 11 seconds to complete (virtually the same
amount of time each 30 seconds).
Question:
Does anybody know of any 30 second Kernel timer that might could be
causing this? I haven't (yet) run across any 30 second timer in the source
code listings that might cause this. Anybody?! Ideas? Perhaps
un-well-documented kernel parameters for VLM systems?
Summary pending responses.
Randy Hayman "who`da thunk we were building a mainframe?"
haymanr_at_icefog.alaska.edu
Received on Mon Sep 29 1997 - 23:16:44 NZST