Debugging performance problem

From: <jreed_at_appliedtheory.com>
Date: Mon, 26 Mar 2001 12:09:39 -0500

We have an in-house app that runs on 4 GS140s plus 2 ES40s - Tru64 v4.0F,
oracle database on 2xGS140s, Apache servers and squid on 2xGS140s,
search engines on 2xES40s. Users are reporting slow performance daily
during its busy times, and the developers keep coming to me to
determine if hardware or kernel parameters could be causing it.

I've looked at hardware and performance, the main place I see anything
that looks like an issue is on the webservers. One GS140 gets about
5 million hits/day. It processes them, serves static content, and
passes request for dynamic content to the 2nd GS140 which then queries
other hosts for content.

The back-end webserver (serving dynamic content) typically has 330+
apache processes running. It has 4 CPUs, 8GB memory. Memory is typically
about 2/3 free, but load frequently ranges from 17-35, undoubtedly
all apache processes passing through. I've been watching vmstat today,
and my question is regarding that. I'm seeing values like the following:

Virtual Memory Statistics: (pagesize = 8192)
  procs memory pages intr cpu
  r w u act free wire fault cow zero react pin pout in sy cs us sy id
 31 1201 40 249K 657K 122K 2052 151 942 0 88 0 1K 8K 13K 44 24
32
  9 1219 39 249K 657K 122K 5143 349 2700 0 318 0 1K 22K 8K 45 38
17
 18 1256 38 251K 655K 123K 4682 201 2469 0 178 0 1K 14K 7K 45 34
22
 24 1364 38 257K 647K 125K 11K 1253 3517 0 812 0 2K 19K 8K 52 41
 7
 28 1525 37 266K 635K 127K 8341 279 1714 0 474 0 2K 18K 7K 47 44
 8
 20 1682 38 275K 624K 129K 10K 349 3541 0 578 0 2K 24K 8K 48 46
 5

Threads waiting interruptibly ("w") seem quite high. Context switches ("cs")
seem quite high, and system mode time ("sy") sits around 30-45%.

Is there any good way (other than running the app in some kind of debug
mode, which isn't an option) to find out what the system is doing when
it spends so much time in system mode?

Do I seem to be looking at the right things here?

Can anyone point me to any good docs that might give me some insight
into ways to diagnose the situation further? I've been working extensively
with the system perf. manuals online, and the systems were tuned according
to the whitepaper about busy internet servers from Compaq, but I'm hoping
for other whitepaper-type resources or guidelines.

TIA...

Judith Reed
jreed_at_appliedtheory.com
Received on Mon Mar 26 2001 - 17:10:44 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT