SUMMARY: defragcron causes corruption in ADVFS filesys? from Judith Reed on 1998-08-04 (tru64-unix-managers)

From: Judith Reed <jreed_at_wukon.appliedtheory.com>
Date: Tue, 04 Aug 1998 04:52:14 -0400

The consensus is that defragcron is risky, and many folks disable
it. It is shipped enabled by default, at least in DU 4.0D. To disable,
comment it out of the crontab. You can run it by hand when desired,
presumeably when filesystems have sufficient space to complete the defrag.
I'll append specific info below. Thanks to all who replied!

Judith Reed - jreed_at_appliedtheory.com
---------------------------------------------------------------------------
My question:
> We are grasping at straws here. A production system crashed this a.m.,
> nothing in any log anywhere except the fact that at around 4:00 a.m.
> /usr/sbin/defragcron failed due to a filesystem being full. Has anyone *EVER*
> had an instance where defragcron failed and subsequent problems were traced
> back to this failure? How does defragcron do its work - can it leave a
> filesystem in an unstable state?
>
> This is on a DU 4.0D system with fairly recent patches.
---------------------------------------------------------------------------
kstran_at_acxiom.com (Keith Strange) described a specific problem with a
CLCMC311 jukebox driver after doing an "installupdate" to DU 4.0D,
which coincidentally corresponded with defragcron failures. He said there is
a patch available.
---------------------------------------------------------------------------
tpb_at_zk3.dec.com (Dr. Tom Blinn) said:
defragcron is trying to defrag one or more of your AdvFS file systems. I
can imagine that a file system full situation could later lead to a panic.
... The defrag is trying to compress your AdvFS file systems to
improve performance and free space. If the defrag gets into nasty trouble
(a full file system might be nasty), it might trip over some AdvFS bug
and panic the system.It should NOT leave a file system in an inconsistent
state.
---------------------------------------------------------------------------
iwm_at_uvo.dec.com (I. W. Morgan) reports:
This is an issue with V4.0D. What happens is that if a domain is getting full,
or has lots of files, or has large files in it, the log file will become
exhausted at times of heavy use - for example when running defragment, or
balance.This is an issue which does not have a patch available.
The recommendation is twofold:
1) Do NOT run defragcron - turn it off!
2) Recreate your AdvFS file domains using a larger value for the logfile
size. This can help to reduce the problem.
Incidentally, you probably saw a panic which says "log file half full".
I have seen occasion s where this problem left the domain corrupt - since it
was
unable to complete the log transfer which is basically a metadata log.
Incidentally, the problem existed in earlier versions, it is just V4.0D which
exhibits the fault most often because it was this release which incorporated
the
automatic defragment from cron!
---------------------------------------------------------------------------
i769646_at_smrs013a.mdc.com (C. Ruhnke) goes into substantial detail:
I have had some problems with defragment (which is the AdvFS utility that
is run by defragcron) crashing a system when the AdvFS domain it is trying
to process is too heavily fragmented. By default, mkfdmn creates a domain
with a 512 page log file. This size is sufficient to handle files with up
to about 40,000 fragments. If a file has more extents that that, you
may get a panic crash with an status of "release_dirty_pg: log_half full".
If this is your problem, you will need to increase the log file size on the
hyper-fragmented domain. To locate the hyper-fragmented domain you can use
the "showfile -x" command. I set up a script to perform the following:

  find <domain_mount_point> -xdev -exec showfile -x {} >/temp/<domain> \;

for each of the AdvFS domains on the system. Then I executed:

  grep extentCnt /temp/<domain> | sort -n -r -k 2 | more

for each output to see what the largest fragmented files on each domain
looked like.
If you find a domain with a file that has more than 40,000 extents, you
will need to increase the log file size or else backup the fragmented file,
deframent the domain and then restore the file -- which will hopefully
result in less fragmentation.
To increase the log file size you will need a second partition assigned to
the domain. If the domain does not already have at least two partitions,
you will need to set up a spare partition and then use addvol to add it
to the domain. You can rmvol it after you have increased the log file
size. Use showfdmn to examine the domain:

  # showfdmn <domain>

                  Id Date Created LogPgs Domain Name
  31b8a083.00049136 Fri Jun 7 17:34:59 1996 512 <domain>

    Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name
     1L 401408 0 100% on 128 128 /dev/rz11b
     2 262144 192 100% on 128 128 /dev/rz3b

This shows that your LogPgs (log file size) allocation is 512 (the default)
and that it is assigned to partition #1 (the "L" after the 1 above).
Move the log file to the other partition with switchlog and then move it
back with an increased size (add 128 pages for each extra 20,000 extents):

  # switchlog <domain> 2
  # switchlog -l 1024 <domain> 1

If you used addvol to add the second partition, you can rmvol it now.
Defragcron and/or defragment should now succeed without crashing your
system.
---------------------------------------------------------------------------
kellybe_at_llnl.gov (Bruce Kelly) says:
If you have large advfs file systems things like defragment, balance, or
removing a large file can cause corruption of the advfs file system. If you
have DU 4.0d with patch kit 1, then you have fixed half the problem. DEC is
working on the other half. In the mean time, you need to expand the advfs
log file in each large advfs file system to try to prevent this from
happening. We have been setting our log sizes to 65500 on file systems over
20GB. DEC says that is not large enough, but it is about as big as you can
make a log file. You will need to use the routines addvol and switchlog to
do this.
If you have spare disk space you can use "salvage" to dump the file system
to the other disk. This works well with very litlle, if any, loss of data.
It just takes forever.
---------------------------------------------------------------------------
Bert.Deknuydt_at_esat.kuleuven.ac.be (Albert De Knuydt) says:
Yap. I've had it too. I've been running defragment and balance on all my
smaller AdvFS's since a long time, with no trouble. Recently, I started a
nightly defragment/balance on a 24 GB domain. After about a week, the
domain started panicing. It happened when only defragment was running.In
the logs, there was nothing exceptional (no filesystem full), apart from
the panic. I now run defragment and balance only occasionally by hand.
---------------------------------------------------------------------------
Others indicated they had similar problems. I'm going to turn it off -
too big a risk.
Received on Tue Aug 04 1998 - 08:53:35 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT