We've been wrestling with a big performance problem for a
while now. The hardware is from an approved reseller, and
we're in the process of getting a maintenance contract, but
for various reasons this isn't happening fast.
The symptom: Occasionally, you type a keystroke, and don't see any
reaction/response for 1-4 seconds. At other times, the box blazes right
along. Problem manifests even with minimal number of users, with or without
database daemons running.
The environment: Alpha 4000 5/400, 2 processors, 1 GB RAM, System disk on a
0+1 (4 GB) Raid set, swap partition (alone) on a 4GB disk on regular SCSI
controller, database files (Unidata) on a Raid 5 set spread across a 3
channel controller, 60-70 users during the day. RAID disks are UltraWide
SCSI, regulars are FastWide. Running 4.0D rev 878, with no patches
installed (though it arrived as is...how do I figure out what if any patches
were installed?)
Comparison: We used to be running an Alpha 2000 4/275, 1 processor, 384M
RAM, system disk on a single 4GB disk on regular controller, database files
on external raid (HSZ40, 4 disks, shadowed and mirrored). All disks
FastWide.
Other performance: Database queries, once they get going, are about twice
as fast as the old machine, but in terms of response time, we never had
anything even remotely approaching these problems on the old machine. Had
some worse versions of this problem prior to raising virtual memory
threshholds; before doing that, when we hit rock bottom on free pages, it
would croak for 10-20 seconds at a time, but even then, the two different
problems were distinguishable---really bad when out of physical memory,
unpredictably and less severely bad at other times. If I run iostat in one
window and vmstat in another, I can watch while vmstat *stops*, and I'll see
iostat scroll on up the screen with no activity on the swap disk, and then
vmstat will start up again, at which point I will sometimes see a small
burst of activity on the swap disk. Vmstat stopping correlates with all
interactive activity also stopping.
What we've tried: Response is the same at the console as over the network,
but upped network connection first to 10M/Sec dedicated segment, then to
100M/Sec hub. Went from 512M RAM to 1GB. Moved swap partition off root
disk and onto its own disk. Adjusted vm kernel parameters to raise
threshholds (helped). Tried performance with database moved to a 4GB disk
on regular SCSI controller. (no difference) Have had people in to look
closely at the disks and RAID controller (based on suspicion that disk
access on the swap partition was a problem), but they seem fine.
What's on the queue to try: disable and/or remove the second processor,
pull the first memory board (we have 2 512M's now, the second one was added
when we upgraded from 512 to 1G) and put the second memory board in the
first slot (to test if the original memory board is bad), try to install 3.2
on one of the spare disks and boot off it as a comparison.
I'd appreciate any suggestions. Thanks.
Jerry Marty
Systems Analyst
Zoom Telephonics, Inc.
jerrym_at_zoomtel.com
Received on Tue Nov 10 1998 - 18:08:14 NZDT