> A colleague of mine was told by a software supplier that:
>
> "Some forms of Unix suffer performance degradation if the number of
> directory entries approaches 2000."
>
> He is looking at a software solution that creates a lot of small files
> (probably > 5000) in one directory. He is new to Unix and wants to
> check
> the above statement.
>
> Has anyone heard similar? Does DEC Unix (or specifically ADVFS)
> suffer?
> Do any of the other unixes suffer from this? What about UFS, does it
> suffer? Are there kernel parameters that can help in such a situation?
> Any gotchas to watch for?
>
> Thanks,
> Mark.Schubert_at_faulding.com.au
>
I had a variety of responses ranging from NO PROBLEM to YES PROBLEM.
However, the response I give most credence to came from Alan at DEC and
follows:
Digital UNIX has a cache of name and inode/tag numbers. In
the course of opening a file finding a name in the cache
keeps from having to search the directory. Of course, if
the directory is used a lot its data will be in the buffer
cache again avoiding disk I/O. When you finally get down to
doing I/O for a name look-up, it depends on the file system.
On UFS filename lookups are done sequentially. For a really
large directory (when the namei cache and buffer cache can't
keep enough data), this can be serious performance problem
for some applications. 2,000 files isn't that many, even
with large file names. The namei cache should do fine with
that many files in a given directory, unless you have lots
of processes scanning through different 2,000 file directories.
I've played with large directories and up through 100,000 it
isn't too bad. Somewhere over that and file name lookups
become I/O bound sequentially reading the directory.
I've never seen AdvFS become I/O bound even up to 1,000,000
files, but once it runs out of namei cache, the lookups can
be CPU bound in kernel mode, which is equally unfriendly.
I haven't looked closely at where "knee" between friendly
and unfriendly is, but 2,000 files per directory shouldn't
On a production 2100 we have a number of directories on ADVFS file
systems with over 7000 files in them. I noticed NO performance problems
when trying to access files in these directories. For example an "ls
-ltr" took only 8 seconds before it started displaying results. A
standard "ls" was immediate. I also opened the directory from my WIN NT
PC via SAMBA and it took 30 seconds to load the display of ALL 7000
files on my PC. CATting the contents of a random file within the
directory was immediate.
Unfortunately no responses came back about other unixes but the problem
is file system dependant anyway. Most major unix vendors have their own
file systems (eg. ADVFS) which I guess are unlikely to suffer.
Regards,
Mark.Schubert_at_faulding.com.au
Received on Thu Apr 16 1998 - 03:40:40 NZST