We're doing work developing software that takes large numbers of files
containing genetic sequence and compiles them into contiguous good quality
genetic sequence. We therefore somewhat routinely find ourselves
attempting to read diretories containing 80,000 files or more and notice a
substantial performance hit in file access compared with smaller sets of
data.
So, my question is pretty simple in just wanting to know if there is some
way to improve filesystem performance in these kinds of situations? Or
would we be better off solving the problem in userland by breaking up the
datasets into 80 directories of 1,000 files and in some way hashing the
individual filename to come up with the appropriate directory to get the
file out of?
We're using 4.0D with ADVFS partitions.
--
Lamont Granquist lamontg_at_genome.washington.edu
Dept. of Molecular Biotechnology (206)616-5735 fax: (206)685-7344
Box 352145 / University of Washington / Seattle, WA 98195
PGP pubkey: finger lamontg_at_raven.genome.washington.edu | pgp -fka
Received on Tue Aug 10 1999 - 20:12:17 NZST