SUMMARY: Minimum free disk space <10% performance hit??

From: Richard Bemrose <rb237_at_phy.cam.ac.uk>
Date: Tue, 28 Apr 1998 13:08:55 +0100 (BST)

Hi all,

First of all, I must thank all who replied for their quick and
very detailed postings:
  Alan Rollow <alan_at_nabeth.cxo.dec.com>
  Kevin Reardon <kreardon_at_na.astro.it>
  Nick Batchelor <Nick.Batchelor_at_unilever.com>
  Ryan Niemes <NIEMES_at_opus.oca.udayton.edu>
  LBRO <lbro_at_dscc.dk>
  Anthony Talltree <aad_at_nwnet.net>
  
Secondly, I am sorry for the delayed summary owing to illness.

In my original posting I asked:
>On large capacity volumes (>9Gb and above) what sort of throughput
>performance hit should we expect for minimum free space threshold less
>than 10%. According to the man page [tunefs(8)]: "this value can be set
>to zero, however up to a factor of three in throughput will be lost over
>the performance obtained at a 10% threshold.
>
>Do I really need to reserve 10% on a 23Gb disk? This results in
>reducing the capacity by 2.3Gb! Or perhaps I should reserve a certain
>amount? Say,
> reserve 0.4Gb (or 10%) on a 4Gb disk capacity
> reserve 0.4Gb (or 2%) on a 23Gb disk capacity

The overwhelming consensus was that it is perfectly reasonable to reduce the
'minimum free disk space' value down to 2-5% in preference to reserving
an average of # MB per volume. For a detailed explanation please refer to
Alan Rollow's <alan_at_nabeth.cxo.dec.com> postings (1 poster + 1 forward
from Kevin Reardon)

I've attached all responses as I feel they should go into the archive.

----
Kevin Reardon <kreardon_at_na.astro.it>
Check in the archives for two summaries on this subject /snip/:
<included by Richard Bemrose>
http://www.ornl.gov/its/archives/mailing-lists/alpha-osf-managers/1995/07/msg00087.html
http://www.ornl.gov/its/archives/mailing-lists/alpha-osf-managers/1996/04/msg00377.html
Most people claim going down to 1-3% doesn't cause any problems. You will
want to be sure to run tunefs -o time on your file system after lowering
the freespace. Maybe the default switch from 'time' to 'space' allocation
techniques when free space is set to less that 10% is the real source of
the speed hit that tunefs is talking about?
I include below an explanation by Dr. Wayward Inode himself from one of
the above summaries that describes the only objective way I've seen to
determine an optimal free space value for a disk. 
From: alan_at_nabeth.cxo.dec.com
               Background.
    The Berkeley Fast File System is divided up into a set of
    groups of cylinder, typically 32 cylinders per group.  Each
    cylinder group has its own inode table, cylinder group summary,
    backup superblock and data space.  When a new directory is
    created the cylinder group with the most free space is selected.
    When files are created in that directory, the allocation
    algorithm prefers to use the cylinder group where the directory
    was allocated.  For time-sharing workloads this allows generally
    related files to be close together.
    When blocks are allocated to a file, the allocation code prefers
    to use the same cylinder group as the file, then nearby cylinder
    groups, then a quadratic hash search, and finally a linear search.
    To help keep sufficient free space in cylinder groups for the
    allocations, large files are split up over multiple cylinder
    groups.  To help the file system have free space, some amount
    is reserved (the minfree of 10%).  As cylinder groups fill up
    and the file system fills up, the slower search algorithms are
    used, reducing performance.  More importantly for read performance
    the blocks of a poorly allocated file will scrattered all over
    the disk.
The 10% minfree default.
    The 10% value was selected over 10 years ago when the largest
    available disks were around 512 MB.  Given the geometries of
    disks at the time and typical cylinder group arrangement, 1/2
    MB to 1 MB was reserved per cylinder group (averaging across
    the disk).  Unfortunately, the 10% value has been virtually
    enshrined as a fundemental law of the universe, without much
    work to ensure that is the right value for modern disks.
----
Nick Batchelor <Nick.Batchelor_at_unilever.com>
     As far as I understand it, the 10% limit only applies to traditional 
     UFS file systems.  I think it has to do with the alogorithms UFS uses 
     to try to position related blocks adjacently to each other.  With less 
     than 10% free space, the system will spend a disproportionate amount 
     of time just trying to work out where to put new blocks into the file 
     system. 
     
     I don't think the same problem occurs with more advanced file systems 
     like JFS and Advfs which use extent based algorithms for allocating 
     space in the file system.
----
Alan Rollow <alan_at_nabeth.cxo.dec.com>
The 10% default and comment about affect on performance came
from a time when large disks on UNIX systems were 256 MB and
bigger than many mid-range AlphaServers.  Few vendors have
bothered update that text as inherited from Berkeley or even
test to see what values should be used for larger disks.  In
Digital's case it doesn't help that UFS is the poor 2nd cousin
to AdvFS these days.
Space on UFS is organized around the cylinder group.  Each group
of cylinders has a backup of the superblock, its own summary
block and its own inode table.  UFS allocates new data to a file 
by trying to keep it close to the existing data of a file.  As 
files get large, the space is spread out over the disk so that 
one file doesn't use all the space in a cylinder group.  As
the disk becomes full, so do the cylinder groups.  If a group
is full, but the file system would have preferred to allocate
data it in it, it has to find a nearby group for the space.
If it can't find a nearby group, then it will eventually take 
space in the first group it finds, spreading the file out
more than is desired.
By keeping some percentage of the space free, normal allocation
of space will spread this evenly between the cylinder group.  As
the percentage reserved space is reduced, more groups will fill
up and you may get poor allocations for medium sized and large
files.  On those older disks, when the 10% number was made the
default, that 10% represented an average of between 256 KB and 
1 MB per cylinder group; with the typical group being 16 cylinders
at the claimed geometry.  As the capacity of disks has increased
so have the size and number of cylinders.  Keeping those same
capacities per group (on average), today's large disks can get
by with reserved space of 2-5%.  A really large disk could
probably get by with 1%.
I've never tried to measure the affect of using these smaller
percentages of reserved space and I'm not sure that providing
an average of .n MB per group is the right goal.  I think some-
one did study this once.  It may have been a paper presented
in a USENIX Proceedings, or something from a university.  I
haven't read the paper, but I've read of it, and I recall that
it recommended smaller percentages for reserved space on large
disks.
Some groups are bound to fill up sooner than expected, especially
if large files dominate the file system, but that can be controlled
by tuning maxbpg so that it is smaller than the size of the cylinder
group (or by making the groups larger).
----
Ryan Niemes <NIEMES_at_opus.oca.udayton.edu>
This is a very good question that I would like answered as well.  I 
did read the man pages on one of our Solaris boxes, and it says the 
same thing (guess I never noticed it before).  I will try to look 
into it there, but if you get an answer please let me know.
----
LBRO <lbro_at_dscc.dk>
     This smells of RAID and UFS. The UFS divides the disk into 'cylinder 
     groups' that are adjacent cylinders. The performance of UFS depends on 
     its ability to find the next free block in the same cylinder group as 
     the the one where the disk head is already located. The possibility 
     for that decreases as free disk space drops. At nearly zero free space 
     on a disk with a well distributed UFS filesystem, the available blocks 
     will be scattered evenly over the entire disk.
     
     The funny thing about UFS is that when space usage drops again, the 
     file that were scattered around can be 'defragmented' just by copying 
     it. Because then UFS allocates nice optimal blocks for the new copy 
     again. That is why there is no need for a defragment utility on UFS.
     BUT:
     
     On a RAID volume, how can you tell the 'disk geometry' ? If you want 
     to have any benefit of the 'cylinder group' stuff, you must be able to 
     tell exactly how the RAID controller works (you must express the 
     function of three or more disks in terms of cylinders and sectors of 
     one disk)
     
     So maybe it is time for you to turn over to AdvFS that does not 
     optimize based on disk geometry. (AdvFS has a defragment program 
     instead). /snip,snip due to LBRO's request/
     So, my opinion is: 
     
     Disk systems are so complex today (striping, parity...) that we will 
     never know if the UFS optimizer really helps us, I would say it 
     doesn't, though we may be sure that it harms performance whem filling 
     is high. So, use AdvFS and go fill the disk.
----
Anthony Talltree <aad_at_nwnet.net>
More like 1.9G, since the 23G drives have about 19G of usable space.
Some OS's, eg. BSDI's, default minfree to 5%.  The 10% figure was
informally picked in an age where filesystems and their usage were much
different.  I suggest using tunefs to set minfree to 5%, then force
optimization back to time.
Much of the minfree issue depends on the use of the filesystem.  If it's
being used for big preallocated files (Oracle, Cyclone, Diablo), then
minfree can happily be set to 0.
----
Regards,
Rich
 /_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ _ \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\
/_/       Richard A Bemrose     /_\ Polymers and Colloids Group \_\
/_/ email: rb237_at_phy.cam.ac.uk  /_\    Cavendish  Laboratory    \_\  
/_/   Tel: +44 (0)1223 337 267  /_\   University of Cambridge   \_\   
/_/   Fax: +44 (0)1223 337 000  /_\       Madingley  Road       \_\   
/_/       (space for rent)      / \   Cambridge,  CB3 0HE, UK   \_\   
 /_/_/_/_/_/_/  http://www.poco.phy.cam.ac.uk/~rb237 \_\_\_\_\_\_\
             "Life is everything and nothing all at once"
              -- Billy Corgan, Smashing Pumpkins
Received on Tue Apr 28 1998 - 14:19:35 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT