SUMMARY-Disk tuning + performance

From: JIM MITROY <J_MITROY_at_BLIGH.NTU.EDU.AU>
Date: Wed Jul 12 05:05:04 1995

> Gidday,

> I have a question or two about setting up a disk for optimal
> performance.

> I have a system which has an RZ28 and and RZ29 as scratch disks.
> When they were newfs'ed the default options were set and so
> 10% of the space on the paritions were reserved by the system.
> On reading the man pages for tunefs, it was stated that if the
> amount of reserved space was decreased substntially (to zero)
> then system throughput could be decreased by a factor of 3.

> I assume the space is reserved so that fragmentation can be
> prevented from being a big problem. Now bearing in mind that the
> disks will be scratch disks, that will be wiped at the end
> of each calculation

>>I received a number of excellent and informative replies. The consensus
>>as to crank down minfree to less than 10%. My concerns were for
>>scratch disks and not for data disks, but the impression I gained was
>>that minfree could also be reduced for large size user disks as well.
>>People commented about disk striping, but I was not concerned about
>>performance per se, rather having as much space without taking a
>>performance hit. Just for the record, for those of a similar level of
>>knowledge, (perhaps I should stress lack of knowledge), the discs have
>>to be umounted before you can tune the parameters.
 
>>Jim Mitroy

From: alan_at_nabeth.cxo.dec.com
               Background.

    The Berkeley Fast File System is divided up into a set of
    groups of cylinder, typically 32 cylinders per group. Each
    cylinder group has its own inode table, cylinder group summary,
    backup superblock and data space. When a new directory is
    created the cylinder group with the most free space is selected.
    When files are created in that directory, the allocation
    algorithm prefers to use the cylinder group where the directory
    was allocated. For time-sharing workloads this allows generally
    related files to be close together.

    When blocks are allocated to a file, the allocation code prefers
    to use the same cylinder group as the file, then nearby cylinder
    groups, then a quadratic hash search, and finally a linear search.

    To help keep sufficient free space in cylinder groups for the
    allocations, large files are split up over multiple cylinder
    groups. To help the file system have free space, some amount
    is reserved (the minfree of 10%). As cylinder groups fill up
    and the file system fills up, the slower search algorithms are
    used, reducing performance. More importantly for read performance
    the blocks of a poorly allocated file will scrattered all over
    the disk.

The 10% minfree default.

    The 10% value was selected over 10 years ago when the largest
    available disks were around 512 MB. Given the geometries of
    disks at the time and typical cylinder group arrangement, 1/2
    MB to 1 MB was reserved per cylinder group (averaging across
    the disk). Unfortunately, the 10% value has been virtually
    enshrined as a fundemental law of the universe, without much
    work to ensure that is the right value for modern disks.

> (a) Can I decrease the reserved space to a smaller fraction than 10%
> without suffering a throughput penalty?

a. Probably. Some work has been done to find better values
        for minfree. I've only heard the results 2nd or 3rd hand,
        I think 2% was found suitable for 1-2 GB disks. One way
        find a reasonable value would be to work backwards from
        wanting to have 1/2 - 1 MB per cylinder group. For your
        typical cylinder group size, figure out how much space
        that wants to be and what percentage of the capacity it
        is.

>>This was the consensus

> (b) Both the disks are large disks. One is 2 GB the other is 4 GB.
> Can I have a smaller reserved fraction for this bigger disks than
> for smaller disks without any penalty.

 b. Since minfree uses integer percentages you'll be getting
        down to the 1% to 5% range and there isn't much fine tuning
        you can do.

>>this was the consensus

> (c) If the files only fill up 60% of the disk during any calculation,
> then could I assume there would be no performance penalty?
 
c. Probably none from having to look hard for free space.
>>this was the consensus

> In essence, the disks are used for a couple of big data files that
> are created during some 2 day calculations. The files are more
> or less written during the first hour of the calculation (one is
> random access, one is sequential) and then read repeatedly for the
> next couple of days (no other action on the disk during this period).

> In essence, I want to reduce the amount of reserved space on the
> disks from 10%, (lose 200 MB and 400 MB respectively) and want to
> know if there will be any throughput penalty given the potential
> application as scratch disks.

  Other stuff.

    Other file system parameters may affect the performance, especially
    for high bandwidth work-loads. The file system code won't allocate
    more than 16 MB of contiguous disk space per cylinder group. If you
    tend to have a small number of very large files, you may want to
    set the cylinder group size to take advantage of this. Some extra
    space is needed per group, but 18-20 MB per group should allow
    using all the space a single contiguous allocation and at the same
    time reduce the seek distance to the next allocation.

    Unfortunately, as the file gets very large the superblock progression
    will split up the allocation.

> Jim Mitroy
 replies from

    
From: Dan Riley <dsr_at_lns598.lns.cornell.edu>

Jim,

As background, realize that the default newfs settings were tuned
for a 330 MB disk on a VAX 11/750, with a file population typical
of a home disk in the early '80s. Not suprisingly, the defaults are
not ideal for your application--not only is minfree way too large, but
also the default number of inodes is way too large, and the default
layout is not optimal--you want a *much* smaller number of inodes,
and you want to encourage the file system to layout your files in
larger sequential blocks.

A few years back, I did some tests on various tunings for file
systems with small numbers of large files. These were performed
on a DECstation 5000/200 with 1 and 2 GB disks, but a few casual
tests indicate that the results are still reasonably valid for
Alphas with disks up to 9 GB; with some tweaking, you may be able
to do better.

Here is the summary I wrote for internal use after the tests on
our early Alphas:

  The parameters I use for DECstation filesystems with a small number of
  large files are

        newfs -f 8192 -c 32 -i 32768 -m 5 ${device} ${disktype}
        tunefs -a 8 -e 4096 ${device}

  On the alphas, '-a 8' is the default. 'tunefs -e 4096' could be moved
  to the newfs command, but I recommend doing it as a seperate step so
  you can see the new and old values--the new should be approximately
  twice the old value (allowing a single file to allocate about half the
  blocks in a cylinder group, instead of a quarter). The switches to
  newfs turn off fragments ('-f 8192'), enlarge the cylinders/group by a
  factor of 2 ('-c 32'), increase the number of bytes/inode by a factor
  of 16 ('-i 32768'), and cut minfree from 10% to 5%. On /tem or /stm
  partitions where you may get a larger number of small files, '-i 8192'
  might be a better choice, but 32768 should be dandy for roar or pass2
  staging areas. On the DECstations, this gives a file system that uses
  noticeably less space for formatting information, and is significantly
  faster than the defaults.

For your application--a very small number of large files that are
wiped from the disk before the next round--I would go with a minfree
smaller than 5%, perhaps even 0%.

From: Gyula Szokoly <szgyula_at_skysrv.Pha.Jhu.EDU>
Message-Id: <9507110502.AA01867_at_hoplite.pha.jhu.edu>

  Well, I have 9G disks, with very simmilar loads (big [~64M] files).
I set the minfree to 1%. I did not feel a performance hit (which does
not mean that I did not have).

  A similar discussion was about this question a month back on one of the
Sun newsgroups. I gathered that on the disks and load that you have, a
3% minfree is OK. The 10% is from the good old days (100M disks). Nobody
did any hard modelling/analysis as far as I know.
  Recently I switched to ADVFS (heck, it's free), which is supposedly
better in this respect (but harder to back it up to a remote tape -- no
rvdump).

Gyula

>>Replies also from

From: Menelaos Karamichalis <mnk_at_wuerl.wustl.edu>

Benoit Maillard maillard_at_fgt.dec.com

Chris Jankowski - Open Systems Cons.- chris_at_lagoon.meo.dec.com
 Digital Equipment Corporation (Australia)

>>I had put minfree down to 4% before all messages came in. I'll
>>probably reduce to 2% and 3% for 4Gb and 2GB disks respectively.
Received on Wed Jul 12 1995 - 05:05:04 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT