bogus defragment statistics

From: Bill Bennett <BENNETT_at_MPGARS.DESY.DE>
Date: Fri, 29 Aug 1997 17:49:50 +0100

Greetings managers:

Today I found some rather suspect free space statistics in the output
from a defragment run on an AdvFS domain:

  Oxter> more data_defrag.log
  defragment: Defragmenting domain 'data_domain'
 
  Pass 1; Clearing
    Volume 2: area at block 7771040 ( 2610144 blocks): 41% full
    Domain data as of the start of this pass:
      Extents: 11912
      Files w/extents: 11831
      Avg exts per file w/exts: 1.01
      Aggregate I/O perf: 100%
      Free space fragments: 556
                       <100K <1M <10M >10M
        Free space: -67% 6% 38% 123% <====
        Fragments: 308 146 72 30
  
  Filling
    Current domain data:
      Extents: 11832
      Files w/extents: 11831
      Avg exts per file w/exts: 1.00
      Aggregate I/O perf: 100%
      Free space fragments: 317
                       <100K <1M <10M >10M
        Free space: -68% 4% 23% 141% <====
        Fragments: 168 94 39 16
  
  defragment: Defragmented domain 'data_domain'

So far we have seen no problems with data corruption (only the odd
free space statistics), but I'm concerned that this might be the first
signs of a a problem. Any comments or pointers to documentation would
be welcome.

OK, so that was the short description of the problem; unfortunately, it
has been a rather eventful week for that domain, and I haven't a clue
which event might be relevant, so I'll go throught them all for anyone
who is interested in the details:

  0) The machine is a DEC 3000/600 AXP running DU 3.0; all file systems
are AdvFS. Over the past several months, we have seen a few system panics
due to advfs inconsistency problems in one domain, the data_domain, which
consisted of a single volume, a 4-GB Seagate ST15230N, with two filesets.
I had searched the archives and found a summary suggesting that AdvFS
inconsistencies could result from fragmentation, so

  1) at the end of last week, I checked all the filesets on the system
with /usr/field/msfsck and /usr/field/vchkdir, and finding no problems,
I started using defragment; since the data_domain was fairly full (about
94%, according to showfdmn) and badly fragmented (initially 4.35 average
extents per file), I set up a cron job to do a 15-minute defragment run
daily in the early morning hours. That worked fine the first few days,
but then on Monday morning, the disk containing the data_domaian suffered
a number of hardware errors (logged in the binary error log as Error
Type "Hard Error Detected", CAM string "Error, exception, or abnormal
condition", Error Code x0071 "Vendor Specific"), which caused a system
panic due again to an "advfs inconsistency". As on previous occasions,
the disk could be brought back to life by cycling the power prior to
rebooting.

  2) Finally convince that the source of the AdvFS inconsistencies was
a hardware failure, on Monday I replaced the ST15230N with a 9-GB
Seagate ST410800N drive as follows:

    - label ST410800N: # disklabel -wr /dev/rrz8c RZxx
    - add to domain: # addvol /dev/rz8c data_domain
    - remove ST15230N: # rmvol /dev/rz9c data_domain

which then migrates the data from /dev/rz9c to /dev/rz8c as expected, but
ending with:

      can't find disk type for rz9c, disklabel not modified
      rmvol: Removed volume '/dev/rz9c' from domain 'data_domain'

The error probably resulted because the ST15230N has a nonstandard label
(disk type "sgt15230", which _is_ in /etc/disktab), but in any case the
rmvol seemed to have worked, since at this point I found:

  # showfdmn data_domain

                 Id Date Created LogPgs Domain Name
  2f201524.00094b00 Fri Jan 20 20:55:16 1995 512 data_domain

    Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name
     2L 17755614 9945216 44% on 128 128 /dev/rz8c

  3) I let the daily defragment jobs continue to run, and the first one
(Tuesday morning) worked fine, showing at the end:

    Current domain data:
      Extents: 11725
      Files w/extents: 11724
      Avg exts per file w/exts: 1.00
      Aggregate I/O perf: 100%
      Free space fragments: 255
                       <100K <1M <10M >10M
        Free space: 1% 2% 15% 82%
        Fragments: 148 65 25 17
  
  defragment: Defragmented domain 'data_domain'

  4) Then early Wednesday morning (well before the defragment runs
started), our building lost power in a thunderstorm; the machine and
all peripherals were switched off before power was restored and booted
with no apparent problems when it was switched back on the next day.

  5) I happened not to look at (or save) the defragment logs on Thursday,
the first runs after the power loss, being too busy with problems caused
by a large quantity of water left in our basement by the thunderstorm;
the defragment log shown at the start of this post was from the second
run after the power loss, which is the first I looked at. I tried
running msfsck and vchkdir on both fileset in the data_domain, but they
reported no problems, and a subsequent "defragment -n -v data_domain"
still showed the same problem.

So at this point it is unclear to me whether the peculiar degragment
statistics are a result of some sort of file domain corruption due to
the power loss or are related to the events earlier in the week; I had
suspected the former until I found a question (but no summary) about
a similar observation in the archives ... also it is not really clear
to us yet whether this is a real problem or not.

Again, any suggestions or pointers to documentation, etc., would be
appreciated.

Bill Bennett

------------- *** NEW FAX NUMBER as of Feb. 1, 1997 *** -----------------
Dr. William Bennett within Germany International
MPG AG Ribosomenstruktur Tel: (040) 8998-2833 +49 40 8998-2833
c/o DESY FAX: (040) 897168-10 +49 40 897168-10
Notkestr. 85
D-22603 Hamburg Internet: bennett_at_mpgars.desy.de
Germany
Received on Fri Aug 29 1997 - 18:13:30 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT