SUMMARY: Performance Problems DU 4.0b + Oracle 7.3.3 from Gunther Feuereisen on 1997-11-12 (tru64-unix-managers)

From: Gunther Feuereisen <gunther_at_ibm.net>
Date: Wed, 12 Nov 1997 09:50:55 +0800

Thanks to:

Robert Otterson <Robert.Otterson_at_digital.com>

Special thanks to:

alan_at_nabeth.cxo.dec.com (Alan Rollow)

Who noticed the high number of forks (something I didn't pick up straight
away),
which put me thinking into a different (and ultimately correct) direction.

The problem ultimately looks to have nothing to do with DU, rather
SQLnet under Oracle.

We managed to track the problem to one userid which was being used for
testing, which seems to have some odd things in its environment, one
of which causes TNSlistener to spin out of control, attempting to
make connections at the rate of 3-4 per second. As the client logs into
the machine which is also the Oracle server, the requests are sent to the
same machine via loopback, which explains why there was the high number
of loopback packets. The high number of forks is due to the TNSlistener
forking all those requests for connections. The high CPU was a combination
of the forks, the loopback, and the fact it was writing to a log file
at the rate of 3k a second.

I found the problem by looking for networ related daemons, and started
with the SQLnet listener after I had discounted the UNIX ones. The tipoff
was when I tried to "tail -f" the logfile, and my screen started scrolling,
which I soon found was a 600 MB logfile, and growing ..

Just another day at the office *groan*

gunther

Previous Emails follow:

>Date: Tue, 11 Nov 1997 14:03:36 +0800
>To: alpha-osf-managers
>From: Gunther Feuereisen <gunther_at_ibm.net>
>Subject: ADDITIONAL 2: Performance Problems DU 4.0b + Oracle 7.3.3
>
>The saga continues ..
>
>The machine I rebooted earlier now has the same problem again 2 hours later:
>
>e.g. from monitor:
>
>pofreb Tue Nov 11 13:39:51 1997 1.88 1.93 2.02 13 users
>
>Mem: act inact wired free Forks: fork vfork Char: in out
> 182352 47224 68368 209488 31.36 0.00 5.5 483.3
>
>Paging: re pin pout flts cow zf hit% Disk: kbps tps
queue
> 0 575 0 3201 354 1170 0 rz8 0 0
   0
> rz17 16 1
   0
>Swap: Reserved Free Cache: Namei Buffer rz18 0 0
   0
> 20% 100% 98% 0% rz19 0 0
   0
> rz25 16 1
   0
>CPU: user nice sys idle wait swtch intr scall rz26 0 0
   0
>#0 48 0 50 2 0 1023 23 12561 rz27 0 0
   0
> re0 0 0
   0
>Net: ipkts ierrs opkts oerrs collis fd0 0 0
   0
>tu0 0.0 0.0 0.0 0.0 0.0
>tu1 12.4 0.0 6.0 0.0 0.0
>sl0 0.0 0.0 0.0 0.0 0.0
>lo0 315.6 0.0 315.6 0.0 0.0
>ppp0 0.0 0.0 0.0 0.0 0.0
>
>
>
>pofreb #
>
>
>The lo0 in/out packets stays fairly constant, and the system CPU
utilisation stays
>at around 50%
>
>Anyone know what might be causing this? As I understand it, loopback is
part of
>the implementation of the TCP/IP protocol stack, and as such an anomaly
would be
>a software error?
>
>Oh, the release is DU 4.0b + Patch Kit 00004
>
>Thanks again,
>gunther
>--
>Previous messages:
>
>>Date: Tue, 11 Nov 1997 11:22:28 +0800
>>To: alpha-osf-managers
>>From: Gunther Feuereisen <gunther_at_ibm.net>
>>Subject: ADDITIONAL: Performance Problems DU 4.0b + Oracle 7.3.3
>>
>>I did some more investigating, and I found that the loopback interface
lo0 was
>>generating all the network traffic (300 pkts each way), which could be
hogging
>>the system CPU utilisation .
>>
>>I failed over to my backup machine, and found that the problem seems to
have gone
>>away, my system CPU utilisation down to about 8% ..
>>
>>I rebooted the original master machine, and the loopback problem seemed
to have stopped.
>>
>>However, my loopback packets still average 5-10 .. which still seems a
little high?
>>
>>Anyone have any suggestions asto why this could be happening?
>>
>>tia (again)
>>gunther
>>--
>>My original message follows;
>>
>>>Sender: alpha-osf-managers-relay_at_sws1.ctd.ornl.gov
>>>Followup-To: poster
>>>Date: Tue, 11 Nov 1997 12:40:19 +0800
>>>From: Gunther Feuereisen <gunther_at_ibm.net>
>>>Subject: Performance Problems DU 4.0b + Oracle 7.3.3
>>>X-Sender: gunther_at_pop03.ca.us.ibm.net
>>>To: alpha-osf-managers_at_ornl.gov
>>>X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.3 (32)
>>>
>>>Hi,
>>>
>>>We're currently working on development of a new system, and I've just
noticed
>>>a performance problems; the CPU utilisation seems to be constantly around
>>>or over
>>>50% for system related requests ..
>>>
>>>What I have:
>>>
>>>2 x Alpha 4000's 5/300 512 MB RAM in a DECsafe configuration
>>>
>>>Local storage on each machine, 4 MB RAID 1 (2x rz29b on SWXCR)
>>>Shared storage, 3 x rz29b, each mirrored onto another drive on a second bus
>>>using LSM.
>>>i.e. rz17 <-> rz25, rz18 <-> rz26 and rz19 <-> rz27
>>>Disks are on BA35X shelves going to KZPSA's using DWZZB's
>>>
>>>I'm using AdvFS for fast crash recovery etc.
>>>
>>>On my shared storage, each LSM mirror is a separate fileset, in it's own
>>>filedomain.
>>>
>>>I am running Oracle 7.3.3
>>>
>>>I have looked at resource utilisation:
>>>
>>>approx 280 MB RAM is free
>>>swap has 80% free
>>>I/O across drives is sporadic, with bursts of up to 300 kbps (at the
>>>worst), but on
>>>average, bursts of about 30-60 kbps.
>>>volstat reports occaisional I/O consistent with the above.
>>>
>>>I've checked uerf/dia, and there are no errors etc.
>>>
>>>Tools I've used; iostat, vmstat, swapon, monitor, volstat, uerf
>>>
>>>And yet, performance is sluggy, with the system utilising a constant 50-60%
>>>CPU ..
>>>
>>>Anyone have ideas? I can't see it .. is it something obvious I am just not
>>>seeing?
>>>
>>>thanks in advance,
>>>gunther
>>>
>>>
Received on Wed Nov 12 1997 - 03:08:53 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT