I have an 85Gb data set in 1400 (daily) files. The individual files are sorted
by the primary key. I need to create either a sorted single file, or more
likely, sorted individual files (there are about 4 million different keys).
Creating individual files is done by reading each daily file and appending each
record to an appropriate file (done with Perl). The number of different keys
per daily file runs in the range of 700,000 to 1.5 million. This, of course,
leads to a large number of open/close file operations.
In searching the archives I confirmed that it wouldn't be very effective to put
all 4 million files in a single directory. I tested breaking them up into both
100 different directories and 1000 different directories. Using more
directories increased the run time by about 30%. However, even using 100
different directories isn't very effective; right now the estimated run time is
on the order of 5+ weeks. I suspect this is due to the 700,000+
open/write/close file operations (per file) as I can read thru the entire data
set in a day or so.
I would appreciate any suggestions on a better way to deal with the
creation/updating of 4 million files :-)
This is, luckily, a one time operation.
The disk is a 500 Gb Raid-5, using ADVFS.
thanks
Received on Wed May 01 2002 - 17:33:35 NZST