SUMMARY:remote vdump using rsh and dd not working with DU4.0D

From: Christophe DIARRA <diarra_at_ipno.in2p3.fr>
Date: Fri, 28 Aug 1998 18:30:03 +0200 (MET DST)

Hello.

Since my last post, I received answers from free persons.

Many thanks to:

C.Ruhnke" <i769646_at_smrs013a.mdc.com>
soma_c_at_decus.fr (Claude SOMA - CNTS)
Tom Webster <webster_at_ssdpdc.lgb.cal.boeing.com>

The concensus was to check the vdump and dd block sizes and the network
configuration. The answers are appended at the end of this message.

Following is my original request:

> Hello.
>
> After upgrading from 4.0B to 4.0D, I am unable to vdump remotely root_domain
> and usr_domain. The "vdump" is aborted with the message:
> "dd read error: Connection reset by peer". I have tried several times and
> from different hosts but the result remain the same. All the hosts are in
> DU4.0D.
>
> Following is the backup script:
>
> #!/bin/csh
> vdump -0 -f - / | rsh <host_with_DLT> dd of=/dev/nrmt1h obs=64k
> vdump -0 -f - /usr | rsh <host_with_DLT> dd of=/dev/nrmt1h obs=64k
>
> The first command (vdump of root_domain) always terminates successfully.
> The vdump of usr_domain is always interrupted with a message as bellow:
>
> --------------------------------------------------------------------------
> ...
> vdump: Dumping regular files
>
> vdump: Status at Tue Aug 25 12:47:58 1998
> vdump: Dumped 171862616 of 611482159 bytes; 28.1% completed
> vdump: Dumped 177 of 878 directories; 20.2% completed
> vdump: Dumped 2599 of 28796 files; 9.0% completed
> dd read error: Connection reset by peer
> 446786+190347 records in
> 4732+1 records out
> -------------------------------------------------------------------------
>
> The percentage of data dumped before the error message varies.
> Maybe someone have already solved this kind of problem and could help me.
> Any suggestion is welcome.
>
> Christophe.
>

Changing the block size from 64k to 32k or 60k did nothing. Using 'bs' in
replacement of 'obs' had no effect too. I also tried with differents
tapes and tapes drives.

Finaly, it seems that the problem is due to our network. In fact, I was
able to run successfully my backup script from 7 hosts running DU4.0D.

Monitoring the network ports, it appears that hosts with problems have
an 'Error Frames' greater than 1%. On the other hosts, the 'Error Frames'
is always 0%.

IN CONCLUSION, THE REMOTE DUMP WITH vdump, rsh AND dd DOESN'T FAIL BECAUSE
OF DU4.0D.

Christophe.

-------------------------- Parts of the Answers ------------------------------

From: "C.Ruhnke" <i769646_at_smrs013a.mdc.com>

Christophe,

I can't necessarily offer any answers... I would offer a suggestion (or ask
for a favor?). I have not yet upgraded to V4.0D. In my case I also use
vdump/rsh dd to backup one of my system disks. I have one difference in my
commands though; I use "bs=64K" rather than just "obs". This makes both
input and output blocksize be 64K. Would you try that change and see if it
has any effect?

Thanks!

CHRis
-------------------------------------------------------------------------------
From: "C.Ruhnke" <i769646_at_smrs013a.mdc.com>

> On the host where the 'rvdump' works, it works for both 'bs' and
> 'obs'. When it fails, it fails in both cases.

Well, thanks for trying... Now I have something ELSE that I will need to
test in two weeks when I try to upgrade!

> Maybe I have a network problem. But I remember that my dump script worked
> for 3 years and from all hosts. Why it stops working now after the
> upgrade to 4.0D ?

I understand your frustration. I too am inclined to think the problem is
network related -- not necessarily a network error per se. Check your
buffer sizes and packet limits -- warning I am not a net guru. This
really sounds like a time-out or an overrun/overflow.

Another idea, maybe... How "old" is the media you are trying to write.
If there is a "bad block", maybe dd is losing synch with the network.
Just a thought...

--CHRis
-------------------------------------------------------------------------------
From: soma_c_at_decus.fr (Claude SOMA - CNTS)
To: diarra_at_ipno.in2p3.fr
Subject: Re: remote vdump using rsh and dd not working with DU4.0D

Avec Du 3.2 le block size est de 60k,
si on ne precise pas l'ibs, que prend la commande dd ?
essayer avec bs=60k (pour ibs et obs a 60k).
-------------------------------------------------------------------------------

From: Tom Webster <webster_at_ssdpdc.lgb.cal.boeing.com>

Christophe,

> After upgrading from 4.0B to 4.0D, I am unable to vdump remotely root_domain
> and usr_domain. The "vdump" is aborted with the message:
> "dd read error: Connection reset by peer". I have tried several times and
> from different hosts but the result remain the same. All the hosts are in
> DU4.0D.
>
> Following is the backup script:
>
> #!/bin/csh
> vdump -0 -f - / | rsh <host_with_DLT> dd of=/dev/nrmt1h obs=64k
> vdump -0 -f - /usr | rsh <host_with_DLT> dd of=/dev/nrmt1h obs=64k
>
> The first command (vdump of root_domain) always terminates successfully.
> The vdump of usr_domain is always interrupted with a message as bellow:
[...]
> dd read error: Connection reset by peer
> 446786+190347 records in
> 4732+1 records out

Assuming that your network is clean and free of problems, you may be
running into a buffer underflow. The vdump utility will use a block
size of 60k, while you are trying to feed dd with 64k blocks. This
may not solve your peer reset problems, but I would suggest that you
change your blocking factors to eliminate the possibility of this
problem and try to get better performance from your tape drive.

1. Check the block size of your tape drive, our TZ88 uses a 32k block
   size.

2. Rewrite your command line to use the new blocking factor:
   vdump -0 -b 32 -f - / | rsh <host_with_DLT> dd of=/dev/nrmt1h bs=32k
   
3. We actually do it the other way around (since 3.2d->4.0b) and use the
   form:
   
   rsh <host_with_DLT> "/sbin/vdump -0 -b 32 -f - /" | dd of=/dev/nrmt1h bs=32k
   
Just my $0.02(US),

Tom
--
-------------------------------------------------------------------------------
From: Tom Webster <webster_at_ssdpdc.lgb.cal.boeing.com>
Christophe,
> Since yesterday I made a lot of tests but with only one success (between 
> two 4.0D hosts). Those hosts are located in the computer room and are
> on the same switch.
> 
> Now I am suspecting our LAN because the other hosts are located on
> different places on the lab. I will continue testing with other
> hosts in the computer room. But since tests take time I will give
> the final conclusion the summary.
> 
> All the over tests have failed. I tried from hosts with the same OS version
> (4.0B <--> 4.0B, 4.0D <--> 4.0D) and from hosts with different versions
> (4.0D <--> 4.0B, 4.0D <--> 4.0B). I tried with different DLT2000XT and
> different block sizes (32k, 60k and 64k) with the command you suggested:
Yes, it would seem to imply a network problem.  Something that you may
want to check, if you have the chance, is that the SRM settings for the
network cards match the OS settings -- esp. if you have 10/100 cards in
your systems.  Since the DEC cards don't generally auto-detect, they
are hardcoded to a mode in the SRM.  I've heard mention on this list of
malconfigured cards working OK until they were put under a heavy load.
Check with you network guys and make sure that you agree on speed and 
duplex settings and then check your card settings from the SRM.  If
they indicate that the switch should auto-detect, you may want to see if
they can force the mode on your ports (just in case).
Good luck,
Tom
--
***
Christophe DIARRA
Institut de Physique Nucleaire
Bat 100 - S2I
91406 ORSAY Cedex
Tel: (33) 1 69 15 65 60
Fax: (33) 1 69 15 64 70
E-mail: diarra_at_ipno.in2p3.fr
***
Received on Fri Aug 28 1998 - 16:31:14 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT