> Gidday,
> I have a question or two about setting up a disk for optimal
> performance.
> I have a system which has an RZ28 and and RZ29 as scratch disks.
> When they were newfs'ed the default options were set and so
> 10% of the space on the paritions were reserved by the system.
> On reading the man pages for tunefs, it was stated that if the
> amount of reserved space was decreased substntially (to zero)
> then system throughput could be decreased by a factor of 3.
> I assume the space is reserved so that fragmentation can be
> prevented from being a big problem. Now bearing in mind that the
> disks will be scratch disks, that will be wiped at the end
> of each calculation
>>I received a number of excellent and informative replies. The consensus
>>as to crank down minfree to less than 10%. My concerns were for
>>scratch disks and not for data disks, but the impression I gained was
>>that minfree could also be reduced for large size user disks as well.
>>People commented about disk striping, but I was not concerned about
>>performance per se, rather having as much space without taking a
>>performance hit. Just for the record, for those of a similar level of
>>knowledge, (perhaps I should stress lack of knowledge), the discs have
>>to be umounted before you can tune the parameters.
>>Jim Mitroy
From: alan_at_nabeth.cxo.dec.com
Background.
The Berkeley Fast File System is divided up into a set of
groups of cylinder, typically 32 cylinders per group. Each
cylinder group has its own inode table, cylinder group summary,
backup superblock and data space. When a new directory is
created the cylinder group with the most free space is selected.
When files are created in that directory, the allocation
algorithm prefers to use the cylinder group where the directory
was allocated. For time-sharing workloads this allows generally
related files to be close together.
When blocks are allocated to a file, the allocation code prefers
to use the same cylinder group as the file, then nearby cylinder
groups, then a quadratic hash search, and finally a linear search.
To help keep sufficient free space in cylinder groups for the
allocations, large files are split up over multiple cylinder
groups. To help the file system have free space, some amount
is reserved (the minfree of 10%). As cylinder groups fill up
and the file system fills up, the slower search algorithms are
used, reducing performance. More importantly for read performance
the blocks of a poorly allocated file will scrattered all over
the disk.
The 10% minfree default.
The 10% value was selected over 10 years ago when the largest
available disks were around 512 MB. Given the geometries of
disks at the time and typical cylinder group arrangement, 1/2
MB to 1 MB was reserved per cylinder group (averaging across
the disk). Unfortunately, the 10% value has been virtually
enshrined as a fundemental law of the universe, without much
work to ensure that is the right value for modern disks.
> (a) Can I decrease the reserved space to a smaller fraction than 10%
> without suffering a throughput penalty?
a. Probably. Some work has been done to find better values
for minfree. I've only heard the results 2nd or 3rd hand,
I think 2% was found suitable for 1-2 GB disks. One way
find a reasonable value would be to work backwards from
wanting to have 1/2 - 1 MB per cylinder group. For your
typical cylinder group size, figure out how much space
that wants to be and what percentage of the capacity it
is.
>>This was the consensus
> (b) Both the disks are large disks. One is 2 GB the other is 4 GB.
> Can I have a smaller reserved fraction for this bigger disks than
> for smaller disks without any penalty.
b. Since minfree uses integer percentages you'll be getting
down to the 1% to 5% range and there isn't much fine tuning
you can do.
>>this was the consensus
> (c) If the files only fill up 60% of the disk during any calculation,
> then could I assume there would be no performance penalty?
c. Probably none from having to look hard for free space.
>>this was the consensus
> In essence, the disks are used for a couple of big data files that
> are created during some 2 day calculations. The files are more
> or less written during the first hour of the calculation (one is
> random access, one is sequential) and then read repeatedly for the
> next couple of days (no other action on the disk during this period).
> In essence, I want to reduce the amount of reserved space on the
> disks from 10%, (lose 200 MB and 400 MB respectively) and want to
> know if there will be any throughput penalty given the potential
> application as scratch disks.
Other stuff.
Other file system parameters may affect the performance, especially
for high bandwidth work-loads. The file system code won't allocate
more than 16 MB of contiguous disk space per cylinder group. If you
tend to have a small number of very large files, you may want to
set the cylinder group size to take advantage of this. Some extra
space is needed per group, but 18-20 MB per group should allow
using all the space a single contiguous allocation and at the same
time reduce the seek distance to the next allocation.
Unfortunately, as the file gets very large the superblock progression
will split up the allocation.
> Jim Mitroy
replies from
From: Dan Riley <dsr_at_lns598.lns.cornell.edu>
Jim,
As background, realize that the default newfs settings were tuned
for a 330 MB disk on a VAX 11/750, with a file population typical
of a home disk in the early '80s. Not suprisingly, the defaults are
not ideal for your application--not only is minfree way too large, but
also the default number of inodes is way too large, and the default
layout is not optimal--you want a *much* smaller number of inodes,
and you want to encourage the file system to layout your files in
larger sequential blocks.
A few years back, I did some tests on various tunings for file
systems with small numbers of large files. These were performed
on a DECstation 5000/200 with 1 and 2 GB disks, but a few casual
tests indicate that the results are still reasonably valid for
Alphas with disks up to 9 GB; with some tweaking, you may be able
to do better.
Here is the summary I wrote for internal use after the tests on
our early Alphas:
The parameters I use for DECstation filesystems with a small number of
large files are
newfs -f 8192 -c 32 -i 32768 -m 5 ${device} ${disktype}
tunefs -a 8 -e 4096 ${device}
On the alphas, '-a 8' is the default. 'tunefs -e 4096' could be moved
to the newfs command, but I recommend doing it as a seperate step so
you can see the new and old values--the new should be approximately
twice the old value (allowing a single file to allocate about half the
blocks in a cylinder group, instead of a quarter). The switches to
newfs turn off fragments ('-f 8192'), enlarge the cylinders/group by a
factor of 2 ('-c 32'), increase the number of bytes/inode by a factor
of 16 ('-i 32768'), and cut minfree from 10% to 5%. On /tem or /stm
partitions where you may get a larger number of small files, '-i 8192'
might be a better choice, but 32768 should be dandy for roar or pass2
staging areas. On the DECstations, this gives a file system that uses
noticeably less space for formatting information, and is significantly
faster than the defaults.
For your application--a very small number of large files that are
wiped from the disk before the next round--I would go with a minfree
smaller than 5%, perhaps even 0%.
From: Gyula Szokoly <szgyula_at_skysrv.Pha.Jhu.EDU>
Message-Id: <9507110502.AA01867_at_hoplite.pha.jhu.edu>
Well, I have 9G disks, with very simmilar loads (big [~64M] files).
I set the minfree to 1%. I did not feel a performance hit (which does
not mean that I did not have).
A similar discussion was about this question a month back on one of the
Sun newsgroups. I gathered that on the disks and load that you have, a
3% minfree is OK. The 10% is from the good old days (100M disks). Nobody
did any hard modelling/analysis as far as I know.
Recently I switched to ADVFS (heck, it's free), which is supposedly
better in this respect (but harder to back it up to a remote tape -- no
rvdump).
Gyula
>>Replies also from
From: Menelaos Karamichalis <mnk_at_wuerl.wustl.edu>
Benoit Maillard maillard_at_fgt.dec.com
Chris Jankowski - Open Systems Cons.- chris_at_lagoon.meo.dec.com
Digital Equipment Corporation (Australia)
>>I had put minfree down to 4% before all messages came in. I'll
>>probably reduce to 2% and 3% for 4Gb and 2GB disks respectively.
Received on Wed Jul 12 1995 - 05:05:04 NZST