SUMMARY: Remote backup slowed down after upgrade to DU4.0B

From: <marty.cruchten_at_paulwurth.com>
Date: Mon, 18 May 1998 12:18:37 +0200

Hi managers,

Great thanks to Mitch Bertone who gave me the right idea to find the
solution.

In fact, Mitch pointed out that the 4.0B version of cpio was much slower
than the 3.2D version. In my special case, it wasn't however cpio that
was in fault, but that gave me the idea to look at dd. In fact, I found
that the 4.0B version of dd is about 50% slower than the 3.2D version.
This explained the bad throughput of my backups after the upgrade to
4.0B. I made a test with the 3.2D version of dd set in place, and
miracle, my old throughput showed up again.

Marty Cruchten
system manager - SAP technical consultant
Paul Wurth S.A.
Luxemburg
marty.cruchten_at_paulwurth.com

----------------------------------------- Here comes my original
question --------------------------------------

Actually, I have a strange problem that lasts for serveral years already
and that shows up at an avarage of 1 time per month, but this problem
could always be eliminated by power cycling the server, but now I have a
problem with the same effect but I am not able any more to get rid of
it. Here come the details:

I have two 2100 servers turning both an Oracle database. Server1 (model
4/275) is connected to an TZ877 tape drive by means of 2 DWZZA-AA's. The
disks of server1 are accessed through a KZPAA, those of server2 (model
5/250) through a KZPSA. During night, both servers backup the database
in sequence, server1 directly to tape, server2 through a pipe to server1
(FDDI network). Until last week, I had the following throughput: backup
of server1 at about 3380MB/hour, backup of server2 at about 4360MB/hour.
I suppose the throughput of server2 is greater because of the faster
server model and the faster KZPSA controllers. In any case, the FDDI
network does not seem to be a bottleneck. Server2 is on DU 4.0B for a
long time already, but server1 has been upgraded from DU 3.2D to DU 4.0B
last week. From that upgrade on, I have now the following problem:

Server1 (which has been upgraded) as the SAME throughput it always had
before but server2 (to which no modifications had been applied) has now
a throughput that is a factor of 0,5 slower than usual (a backup of 4
hours now lasts 6 hours). There are no errors in any logs on server1 and
server2 and the backup terminates successfully, just some hours later as
usual. I have power cycled both servers without success. I thought of
kernel parametes that could have been reset during upgrade of server1. I
have reset all parameters in the /usr/sys/conf/SERVER1 file to the
values they had before but the same troughput remains. As the throughput
of server1 did not change, the problem probably lies in a component on
server1 responsible for network traffic (buffer sizes...) but I am not a
network specialist. The global charge of FDDI traffic did not change
from last weekend on, so external influences can be eliminated and there
are no special applications running on server1 during the backup of
server2 either. The backup of server2 is performed with the following
commandline:

echo <file to backup> ¦ cpio -ovB ¦ rsh server1 -l backup_account
/bin/sh -c 'dd bs=5k conv=block of=/dev/nrmt0h 2>&1 ¦¦ echo ERR_RC: $?'

I said I have a problem that lasts already for serveral years. Indeed,
the fact that the backup suddenly slowed down shows up periodically, but
until now, if I power cycled server1 (the one connected to the tape),
the problem had gone. Digital tried to solve the problem and we replaced
already the tape, one of the DWZAA (until now) and the controller
(PB2HA-SA) to which the tape is connected, but the problem still shows
up. I am not able to say if the problem I have at this time is the same
as always, but I suppose not, because power cycling server1 does not
help at this time and when the former problem appeared, the backups of
both servers were slowed down and now, it is only the backup of the
distant server.

If anyone has an idea what I can do to track the problem, please let me
now. Which are the kernel parameters related to network which could be
responsible?
Received on Mon May 18 1998 - 12:21:51 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT