SUMMARY out of space error, metadata, extents, defragment problems from Fortugno on 1996-04-09 (tru64-unix-managers)

From: Fortugno <fortugno_at_cs.usask.ca>
Date: Mon, 08 Apr 1996 11:20:49 -0600

Hello,
        I appreciate the responses I received, the problem has been
solved. The original message is provided followed by a summary
of the responses.

********** Original Message **********

        A detailed description of the problem(s) follows.

Main Problem:
        Receiving out of space messages when there is
plenty of space available. Users are unable to save their
work and often lose files whenever this occurs.

--Error Message: (file: kern.log)
  Mar 26 13:21:46 xxx vmunix: /student: write failed, file system is full
  Mar 26 13:24:53 xxx last message repeated 10 times

--Space Available:
  Filesystem 1024-blocks Used Avail Capacity Mounted on
  users#student 5493744 2136894 456584 82% /student

        /student is one of seven filesets in the users domain all within
        a single volume. Quotas are in effect for two of the filesets
        including /student.

System:
        DEC Alpha 2100 AXP server running Digital Unix 3.2C

History:
        Having scanned manuals and man pages it was determined that
the problem was related to Metadata table space (extents). Further
reading and scanning of the osf_managers list pointed to running the
defragment utility to help clean things up by first reducing the number
of extents and secondly allowing for enough contiguous space to permit
the extent table to grow, thus eliminating the erroneous out of
space messages. Here is what defragment looked like initally:

OUTPUT:
defragment: Gathering data for domain 'users'

/student: write failed, user disk quota exceeded too long
  Current domain data:
    Extents: 256174
    Files w/extents: 60785
    Avg exts per file w/exts: 4.21
    Aggregate I/O perf: 43%
    Free space fragments: 26382
                     <100K <1M <10M >10M
      Free space: 97% 3% 0% 0%
      Fragments: 26318 64 0 0

        After running defragment during successive evenings the
results revealed a significant improvement and it appeared that the out
of space problem had stopped:

OUTPUT:
defragment: Gathering data for domain 'users'
  Current domain data:
    Extents: 86946
    Files w/extents: 61649
    Avg exts per file w/exts: 1.41
    Aggregate I/O perf: 87%
    Free space fragments: 11561
                     <100K <1M <10M >10M
      Free space: 57% 34% 9% 0%
      Fragments: 10862 679 20 0

Additional Problem:
        Then for no apparent reason the out of space messages
began to re-occur and then it was discovered that the defragment
utility no longer would run due to an error which I am unable to track
down any information on:

OUTPUT:
defragment: Defragmenting domain 'users'

Pass 1; Clearing
   Volume 1: area at block 2357584 ( 34928 blocks): 64% full
defragment: Can't move file /student/xxxxxx/filename.ext
defragment: Error = ENO_MORE_MCELLS (-1055)
defragment: Can't defragment domain 'users'

        If someone could help explain why the out of space
messages continue and why defragment keeps failing and if the
two are related and what the solutions to these problems are
that would be appreciated.

********** End of Original Message **********

        Below is a summary of the responses I received and each one is
addressed, this may help others who may experience similar problems.

*****
>From J. Henry Priebe Jr.

Have you checked the inode availability with the df -i command? inodes
will cause a disk full as well as blocks.

>>>>> This had been checked and there were plenty of free inodes.

*****
>From Pat Wilson

In a normal (UFS) file system, I'd suspect that there's a (huge) file open
from some process somewhere - have you tried poking around with "lsof" to
make sure that that's not the case?

>>>>> I poked around and no such file existed.

*****
>From Seela Balkissoon

Are you running Advfs on the disk?
Then you have to build the disk with a -x 1024 option

   mkfdmn -x 1024 /dev/rz?? domain_fs

If no of files exceed 200,000 you will get the error although the
partition is not full, see Ch 8 in Sys Adm man.

>>>>> The number of files did not exceed 200 000 however the number
of extents making up those files did exceed 200 000 and the out
of space message was being received every time the extent table
tried to expand because there was not enough contiguous free space
available for it to expand. Creating a larger extent table size from
the start would have worked but this file system has been in operation
for a couple of years and defragment would have to release enough
space for the table size to be increased therefore this was not an
option.

*****
>From alan

The two are probably related. AdvFS dynamically allocates space
for new file metadata as needed. But, rather than allocate the
needed space for each file, it allocates blocks of space and
allocates from that. The space for one of these extends must
be contiguous, so if the free space is badly fragmented, it
can't allocate a new extent. Defragment, may be able to consolidate
enough free space allow one of these extents to be allocated.
Or, it may not. Sometimes there just isn't enough free space
in big enough pieces to allow defragmentation of a file system.

The table that describes where the metadata extents are, has
a limited size and if you need too many extents allocations
may for this reason as well.

Defragment may be failing because it can't find enough free space
to work with. The message looks like one related to the inability
to extend some data structure.

>>>>> This is perhaps the reason that defragment was failing,
our solution involved removing a number of stale user accounts
which freed up enough space for defragment to work correctly.

*****
>From Knut Helleboe

It may be that the number of files in that fileset still is too high. From
your posting I cannot see that you increased the metadata space after running
the defragmenting. Try shutting down singleuser, defragment and increase the
number of extents on that fileset/domain

>>>>> The table did not have to be increased, by being able to run
defragment the number of extents was significantly reduced so the
the ratio of files to extents was close to 1 (ie ~60000 files/extents).

*****
>From Stephen LaBelle

One thought about defragment quitting on you. There is a -e switch
for the defragment. Its tells defragment to continue to work if it
encounters an error on a specific file. That seems to be what you have
encountered:
                /usr/sbin/defragment -v -e filedomain

>>>>> In this case using -e may of helped, however, I was
concerned about possible data loss, not knowing about the
inner workings of defragment.

Thank you to all who responded,
  Jim

---
Fortugno, V.M.         			fortugno_at_cs.usask.ca
University of Saskatchewan

Received on Mon Apr 08 1996 - 20:00:22 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:46 NZDT