SUMMARY: Defrag causes panic - why?

From: <Rod_BYRNS_at_paribas.com>
Date: Tue, 01 Sep 1998 11:00:08 +0100

Thanks to the following people for responding:



speno_at_isc.upenn.edu

i769646_at_smrs013a.mdc.com

alan_at_nabeth.cxo.dec.com

kellybe_at_llnl.gov

snkac_at_java.sois.alaska.edu

bevanb_at_ee.uwa.edu.au

crossmd_at_mh.uk.sbphrd.com

Knut.Hellebo_at_nho.hydro.com

piip_at_merl.com





Apparently the problem is well know and occurs when a lot of

metadata in a domain is changed (as when defragmenting). The

most common response was move the log file somewhere else,

then move it back with a larger size, i.e.



> # addvol /dev/rz10b domain
> # showfdmn small
>
> Id Date Created LogPgs Domain
     Name
> 31b8a083.00049136 Fri Jun 7 17:34:59 1996 512 small
>
> Vol 512-Blks Free % Used Cmode Rblks Wblks Vol
     Name
> 1 401408 0 100% on 128 128
     /dev/rz11b
> 2L 262144 192 100% on 128 128
     /dev/rz3b
> 3 393216 0 100% on 128 128
     /dev/rz10b
> ---------- ---------- ------
> 1056768 192 100%
>
> # switchlog domain 3
> # switchlog -l 1024 domain 2
> # rmvol /dev/rz10b domain



Full replies follow.



Cheers,

Rod





********
Do you have patch kit 1 or 2? 2 has a fix for the 'log half full'
panic, or so it claims in the release notes.

********


Since you only have the two domains and "data1" completed defragging before
the crash, you can reasonably conclude your problem is with "data".
This type of panic is a known problem with AdvFS domains when they have a
file with too many fragments. The meta-data log file gets too full and
KABOOM!
Search these archives for the last couple of months, this issue was well
discussed and the workarounds explained. I'd rather not resurect that
thread
here...

********


What I've read is that AdvFS insists that the log not become more
than half full. Defragmenting a file system causes lots of metadata
changes, which get recorded to the log. Too many changes, the log
gets too full and the domain panics. The solution I've read is to
increase the size of the log. Of course, there may also be bugs
involved, so having MCS escalate the problem to the attention of
engineering may be wise.



********


There are several problems with advfs log file handling. All, but one of
these problems will be fixed in 4.0e. In the mean time, the thing to do is
increase your log file size. This is done by adding a temporary volume to
your advfs domain, using addvol, and then moving the advfs log file to this
new volume and then moving it back to the original volume with a larger
size. This is done with the switchlog command, and the log file size should
be around 65500 pages.


********

Quick answer: buggy software
defrag caused panics back under 3.2g, recommendation from Digital
Support back then was not to run it.
One would hope it'd be better by 4.0d. I suggest you contact
Digital Support (they do need to get this fixed). You should
also analyze your advfs domain for problems.... either ones which
may have lead to the panic or those left behind by the panic.

********



Some info I received from a digital employee regarding this is repeated
     below.
>The "release_dirty_pg log_half full" panic can be corrected by increasing
>the
>size of your current log file in the domain.
>
>PROBLEM:
>
> Under some circumstances, AdvFS can panic with a message "log half
> full". At this time, the following situations are known to cause
it:
> 1) When a very large file truncate is performed (this can occur
when
> a file is overwritten by another file or by an explicit truncate
> system call), and the fileset containing the file has a clone
> fileset. 2) When very large, highly fragmented files are migrated
> (this occurs when running the defragment, balance, rmvol, and
> migrate AdvFS utilities).
>
> RESOLUTION/WORKAROUND:
>
> In both cases you can work around the problem by reducing file
> fragmentation or by increasing the size of the log. For the second
> case you can determine the required size of the log: Files with
> greater than 40000 extents are at risk. For 40000 extents use a
> logsize of 768, 60000 extents use 1024, 80000 extents 1280, etc.
>
> Determine how many extents a file has by performing the following:
> showfile -x <filename> | grep extentCnt This will indicate how
> many extents a file is using.
>
> If you have files that fit the criterion mentioned above, you may
be
> able to reduce their fragmentation by backing them up; deleting
> them; running defragment; then restoring them.
>
> You can also avoid overflowing the transaction log by increasing
its
> size. You can set the log page size when you make the domain using
> the '-l' option ('mkfdmn -l pages', default is 512).
>
> If you have a spare partition, you can do the following before
> running the commands mentioned above:
>
> # addvol /dev/rz10b domain
> # showfdmn small
>
> Id Date Created LogPgs Domain Name
> 31b8a083.00049136 Fri Jun 7 17:34:59 1996 512 small
>
> Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name
> 1 401408 0 100% on 128 128
/dev/rz11b
> 2L 262144 192 100% on 128 128 /dev/rz3b
> 3 393216 0 100% on 128 128
/dev/rz10b
> ---------- ---------- ------
> 1056768 192 100%
>
> # switchlog domain 3
> # switchlog -l 1024 domain 2
> # rmvol /dev/rz10b domain
>
> The above steps add a spare volume to the domain; moves the log to
> that volume; moves it back with an increase in size; removes the
> spare volume. The larger logsize takes up more space on your volume
> so you may want to reduce it again after the command completes.

********


There is a bug with defragment which will cause a crash if the
file being defragmented comtains a large number of fragments.
When this file is defraged the IO's involved overflow a buffer
and the file system crashes.
Digital have released a patche version of defragment which
does not have this problem (I believe it is in patch set 2)
but can be obtined as a single patch.
If this is not an option, then use showfile -x <file> on
every file in the domain to see which file is causing the
problem, copy the file to another domain, remove the file
defrag the disk and copy iot back.



********

You have triggered a known bug in AdvFS. Contact your local DEC support and
get the latest beta of the Adv. Filesystems utilities (AFA435). This should
fix some of the problems you're seeing ...



********



The error looks like your log file has become full. if you have a look at
     the mkfdmn command man page, you can specify the size of the log file.
     "-l
     mumble" you might be able to enlarge the file by adding another disk
     to the
     domain with the addvol, then forcing the log to that disk using
     switchlog
     command, then removing the new disk (The one you just added), which
     will
     push the new log back to the original disk...
>From some mail I got from DEC
     # addvol /dev/rz10b domain
         # showfdmn small
                         Id Date Created LogPgs Domain Name
         31b8a083.00049136 Fri Jun 7 17:34:59 1996 512 small
           Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name
            1 401408 0 100% on 128 128
/dev/rz11b
            2L 262144 192 100% on 128 128 /dev/rz3b
            3 393216 0 100% on 128 128
/dev/rz10b
               ---------- ---------- ------
                  1056768 192 100%
     # switchlog domain 3
     # switchlog -l 1024 domain 2
     # rmvol /dev/rz10b domain
        The above steps add a spare volume to the domain; moves the log to
        that volume; moves it back with an increase in size; removes the
        spare volume. The larger logsize takes up more space on your volume
        so you may want to reduce it again after the command completes.




-----------------------------------------------------------------------------
This message is confidential; its contents do not constitute a
commitment by Paribas except where provided for in a written agreement
between you and Paribas. Any unauthorised disclosure, use or
dissemination, either whole or partial, is prohibited. If you are not
the intended recipient of the message, please notify the sender
immediately.

Ce message est confidentiel ; son contenu ne représente en aucun cas un
engagement de la part de Paribas sous réserve de tout accord conclu par
écrit entre vous et Paribas. Toute publication, utilisation ou
diffusion, même partielle, doit être autorisée préalablement. Si vous
n'êtes pas destinataire de ce message, merci d'en avertir immédiatement
l'expéditeur.
Received on Tue Sep 01 1998 - 10:14:06 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT