This is an overdue summary. I got a lot of good responses to my question
about how to effectively move a 46GB oracle database across the net.
Responses were primarily in favor of shipping disks or tapes,
as most felt it was hard to get reliably fast speeds on the wires.
My favorite comments were:
"Oy. Never underestimate the bandwidth of a station wagon. :-)"
and
"The world now knows who to blame for their slow internet access!! :-)"
Suggestions shook out as:
ship disk copy - 5
ship tape copy - 6
gzip and transfer somehow - 4
nfs - 2
scp -C - 2
rsync - 2
check network for bottlenecks - 1
(with some crossover btwn. categories)
We still do not know what we are going to do, are going to experiment
some more with unencrypted scp vs encrypted, and with nfs. I'll
also look at the network connections on both ends. Problems
with creating tapes and shipping them were transient and
variable. We'd make a tape containing 8 backups of 8 partitions,
get it there, and one backup would be unreadable for no apparent
reason. Next time it'd be a different backup on the next tape.
Also, our DBA has had problems getting database to come up once it
is restored out there from the tape. I think I like the disk copy option
the best, if we could implement it. A disk is much easier to check for
readability than a tape.
Sites are on opposite coasts of the US, so the exercise is non-trivial.
Thanks again to all who responded. I'm appending responses below, as
there is a lot of good info here.
Regards,
Judith Reed
-------------------------------------------------------------------------
From: Steven Michael ROBBINS *** rsync ***
> Has anyone done anything like this and achieved decent speeds?
No, but the first thing I'd look at is "rsync".
Sites that mirror large FTP sites use it ...
-------------------------------------------------------------------------
From: "Sean O'Connell" *** ship disk copy ***
One silly thought came to mind... buy one of those Seagate 73GB
HDD (~ $US 1000) mount it on the host (nfs if you have another
machine around), do a vdump to a file (or do a vdump/vrestore
combo) and overnite the HDD to the remote site and then restore.
-------------------------------------------------------------------------
From: "Anthony A. D. Talltree" *** scp -C or ship disk ***
Use scp, which comes with SSH. You could do 'scp -C' on a file to
transfer it with encryption and compression. If the path between your
DS3's is fast, you should be able to get a couple of meg per second.
Another approach would be to dump the files on a hard drive and overnight that.
-------------------------------------------------------------------------
From: Rich Lafferty *** send tapes ***
Oy. Never underestimate the bandwidth of a station wagon. :-)
Really, the only advantage moving this over the network would have is
low latency, and you don't *care* about latency. Sending tapes with an
insured next-day courier will be higher-bandwidth than a T1 and easier
to manage than doing it over the network for the reasons you've
discovered. (I assume it's at a different location since you're not
just popping the tapes in a different machine.)
--------------------------------------------------------------------------
From: "David Hull" *** send tapes ***
We have a 300GB database that we usually ftp across a dedicated T3, but we
have sent tapes on a many occaisions. Our tapes are Networker backups, but
we've also done vdumps and tars from time to time to other boxes without a
problem.
-------------------------------------------------------------------------
From: Paul LaMadeleine *** ship disk copy ***
a solution on the expensive side would be to ship the disks to be used at
the remote site to you. Add them into the current system. copy the
database over and ship the to the secondary location.
-------------------------------------------------------------------------
From: "Roetman, Paul" *** gzip and ftp ***
generally
gzip -v5 *
will compress each datafile, at around 85 to 90% you should end up with
about 4-5 gig of datafile. Note: gzip will delete the original file if, and
only if, the compression was successful. If you do not want to delete the
originals, then use this
for x in `ls *.dbf` ; do
gzip -v5 < $x >$x.gz
done
If you have a very powerful machine, and exclusive use of the CPU, then
for x in `ls *.dbf` ; do
gzip -v5 < $x >$x.gz &
done
but this will wipe out the machine for ALL other users, as it will compress
all the files simultaneously!
Then tar the files together, and ftp them across! Maybe tar them to 8-10
files, and use mput to send. That way, if you get a corrupted file during
the ftp process, you only have to repeat some of the work!
Also, an Oracle Export will reduce the size of the database considerably
(probably 95%+) - but it is a very large job to do the import at the other
end!
-------------------------------------------------------------------------
From: Christoph Klein *** ship pc w/disk ***
we had to move 16 million files from on of our systems to an other (approx
40GB). we made
many considerations about time/costs/useability. at last we ended up using a
cheap standard
pc with linux (one large ide disk with reiserFS).
we just copied all the data from one system to the 'transfer-pc', move the
pc to the other site
and copied it back. i don't think there's any faster way.
i think you could use something like that. (perhaps with mirrored drives to
keep at least some
security ;-)
-------------------------------------------------------------------------
From: Selden E Ball Jr *** ship disks ***
Have you considered doing a local disk-to-disk copy
and shipping the disks to your remote site?
That way you could send an actual image of the database.
73GB SCSI disks cost less than $1000 now.
Hot-swap (SCA 80pin) are hard to get due to parts shortages,
but the 68pin models are somewhat available.
36GB disks are correspondingly cheaper.
Of course, this will require downtime on one of your systems
to install and later remove the disk drive(s), unless you
have one configured with hotswap bays and disks.
I'm thinking of a non-Raid configuration, of course.
Otherwise, I don't think I have anything to add that you don't
already know: clean tape drives more than you think they need, use new
tapes, lease a direct line (dark fiber) between the sites, etc.
--------------------------------------------------------------------------
From: "John J. Francini" *** ship disks ***
Here's a potential low-tech solution: grab a spare 36 GB
StorageWorks brick (or one of the newer high-density-packaging
removable disks), mount it on the source system, do either a vdump
with compression to savesetfiles on the 36 GB disk or use GNU-tar
piped into gzip outputting to the disk. Then overnight the disk, in
a well-padded package, to the secondary center.
Low-tech, but it should work much better than tape.
--------------------------------------------------------------------------
From: Udo Grabowski *** scp -C ***
scp -C file remote_host:file (the ssh replacement of rcp) does a good
job in online compression (if both ends have installed it, it's not on
the CDs ==> www.ssh.org). On sparse files it could save a factor of 10
or more (make a test). But that will still hog the net for approx. 1 hour...
--------------------------------------------------------------------------
From: Vangelis Haniotakis *** ship tapes or tar/gzip/ftp ***
Hi. You should identify your bottleneck, if possible. You've got a bunch
of huge database files which you feed to tar, dd or whatever - that's a
time-consuming operation by itself.
Measure tar's throughput with a random file and /dev/null as the exit
file, that should give you an idea of how costly tar's I/O is without
actually writing anything, just doing read()s. If your files are not
cached on physical memory (and most of them won't be), you may find that
it's a significant factor: you need time just to access the files and read
through them. Your hard disks probably boast of 40+ MBps transfer rate -
this is usually inaccurate as hell, you probably won't get much better
than 10 MBps.
You also need time to throw these files to the network - needing yet
another copy of your data to the network stack, then calculation of the
TCP checksum for each outgoing packet and so on. Once your packets are all
assembled, you send them out and the receiving host does the reverse
procedure: calculate TCP checksum, send back ACK's and such, compensate
for dropped packets, etc. Then actually passing the data to write() and
actually writing to the disk - this is usually much slower than
reading. All this stuff takes CPU and I/O time, and most of these
procedures aren't what you'd call 'efficient'.
I'm not surprised at your actual sustained transfer rate - the truth is
that even if you optimize your systems by tweaking kernel network options
and such, I doubt you'll get much better throughput than 1 MBps for
on-the-fly network tar's or dd's, no matter what the network medium
is. This of course means that you need to have the database down for 3+
hours - well, that's the way it goes for this amount of data. I'd
recommend that you stick with tapes (they make good backups as well),
and find a way to make them more reliable.
> Has anyone done anything like this and achieved decent speeds?
> What techniques did you use? Did you use compression? It seems that
> with those sparsely populated database files, compression would help,
> but I don't know any good ways to do it???
Could work, depends on the ratio these files get compressed to, and
how long it takes to compress them. You might want to do this:
- Bring the db down.
- Tar the databases and files locally to bigfile.tar (or whatever). This
should be reasonably fast, but you need a lot of space.
- Bring the db up.
- Compress bigfile.tar, locally. Use bzip2 or gzip, not compress. This
should take a while.
- Transmit the compressed files via the network. Throughput for that
operation should be high (though nowhere near 100 Mbps :) - it's just a
file transfer, after all.
- Uncompress, untar. Enjoy.
--------------------------------------------------------------------------
From: "Donald P. Theune" *** DBA info - ship tape ***
I forwarded your question to my Oracle DBA and got these thoughts from him
so I thought I'd forward them to you for what they are worth.
> -----Original Message-----
> From: John Stricklen
> Sent: Wednesday, December 20, 2000 11:05 AM
> To: Donald P. Theune
> Subject: RE: Moving 46 GB of data across the web
>
> 46 Gig?
>
> The distribution is not very large (only 4-10 GB). The database files are
> most likely "sparsely populated" as the guy says. Here are some
> possibilities.
>
> 1) What are the problems that are encountered with the tapes? Assuming
> the data loads OK: If they are writing the files from Hot backup mode,
> there may be some tricks they've missed on the recovery. The errors
> generated can look like file corruption, but really relate to not having
> the current SCN available for *after* the source database was taken out of
> hot backup. Might try writing to DVD instead of tape.
>
> 2) If the filesystem configuration is not identical on the target machine,
> I can think of 2 or 3 other problems with moving the database to another
> machine that could look like files were corrupted but really are
> configuration issues. Basically, moving an oracle database to another
> machine (or another place on the same machine) can be difficult.
>
> 3) Try doing an export of the source files. This will write only the real
> data from the Oracle data files, then they can use unix compression. The
> compression won't help a lot because the export file will be compressed
> somewhat. This file can then be moved by whatever means works.
>
> 4) If there is a spare hard drive, write the files there, and then just
> send the drive and install it in the target machine.
>
> If the link speed is really as slow as they say (431 kbit), either the
> link is heavily used (maybe wait for off hours?) or there may be a problem
> with the link that needs to be addressed by the provider. This problem
> could be on either end.
>
-----------------------------------------------------------------------------
From: Tony McElhill *** nfs ***
Could you NFS mount sufficient space on your secondary server on the
primary server to do an EXPORT from Oracle, then IMPORT it on that box
into Oracle? We used to do something similar at one site I was at using
Sybase, though the dbase export only amounted to 200-300Mb.
If you use tape I guess you should use the "h" device (e.g. rmt0h) to use
high density compression (if you aren't already).
The only other thing I could think of would be to use a cross-over patch
lead from one dedicated NIC to another - but I suspect the 2 boxes aren't
physically close enough to do that.
----------------------------------------------------------------------------
From: Joe Fletcher *** ship tape ***
Any chance you can add a local tape to the server?
----------------------------------------------------------------------------
From: alan_at_nabeth *** ship tape ***
I'd:
a. Put a tape drive directly on the host with the data
and make the backup.
b. Send the tape to the target site.
c. Put a drive directly on the target host.
d. Restore.
You can probably expect the backup to tape to take anywhere
from 3 to 16 hours (assuming tape speeds in the 5 MB/sec to
800 KB/sec range). You don't say what flavor of DLT, but
these are good guess. If the database is active, you can
expect it to take longer. Using the high density tape
special file, you'll get compression on the drive, which
should be sufficient.
---------------------------------------------------------------------------
From: "Matt.Wilkie" *** rsync ***
Have you looked at rsync? It won't help much with the first time
transfer, but after that it may. Rsync operates by doing a binary
comparison of the files on each host and only sending the
differences.
http://rsync.samba.org/
Works well for updating changed .iso cd rom images:
http://cdimage.debian.org/ch2211.html
---------------------------------------------------------------------------
From: "Haesaerts, Corinne" *** gzip and ftp ***
have you tried gzipping the files before sending them over ?
---------------------------------------------------------------------------
From: John Tan *** gzip, check optimization ***
The world now knows who to blame for their slow internet access!! :-)
Seriously, though, the bottleneck on your data transfer would like on your
data links, not on the operating system. Gzip, etc. can help you to
compress the data, but you would still be left with a rather large chunk to
transfer. It is likely to be the T3 lines that slows you down, but how well
you can perform through that depends on everything else that is happening
through those lines and what restrictions are placed on your switches, hubs
and routers.
On the OS side, you might be able to check that you are utilising the 100
Mbps NIC to the fullest by seeing that it connects to a 100 Mbps switch.
'ifconfig' can help you check that the OS is optimised for your NIC. That's
all I can think of.
-------------------------------------------------------------------------
From: Hannes Visagie *** gzip, ftp , or send tape ***
Off course you should use compression. Not Oracle but gzip -9 and then only
move files. Your 46 GB will come down to about 20 Gig.
I also found in the past on some servers to ftp multiple files across at any
one time is faster that ftp file 1, then file 2. This may vary.
Tune your TCP Windows size. You will have to search on that.
At the end of day, tape is still the easiest to use.
Remember that if you lose a FTP session, you can resend from where the
session got lost, so you don't have to send the file from position 0 again.
----------------------------------------------------------------------------
From: "Lyndon Handy" *** nfs ***
I would suggest remote NFS with over TCP/IP (not udp) to the remote system
and spool this overnight.
----------------------------------------------------------------------------
From: "O'Brien, Pat" *** check net ***
any time I see kb/s in a transfer rate, I opt for sneakernet with floppies.
seriously we have seen similar issues where you need to check the host for
autonegotiate, full or half duplex. you state that you are using 100 mb
cards. there are 3 versions of the de500 100 mb card. we too have been
going crazy trying to figure out which models run best in which modes, and
then there is the other end of the cable. that must be set the same way as
the card on the host. our switches sporadically change modes on us and then
we have to correct with lan_config.
Received on Mon Jan 08 2001 - 21:06:56 NZDT