Hi,
Some thime ago I wrote to the list asking whether anyone else had seen
panics running defragment or cp on highly fragmented filesystems. (I include
my mail message at the end).
Firstly, yes, a couple of other sites have noticed this probelm. Secondly,
it is a serious bug in ADVFS and despite having logged the bug with engineering
almost 2 months ago we still don't have a patch - just inappropriate
suggestions for a workaround. As you may gather from the tone I am less than
happy with the level for support the bank has been offered over this.
Although a patch may be on the way it won't make the V4.0 distribution.
I have had some lengthy correspondance with the US ADVFS engineering group
over this and include a brief description of the problem here:
--------------------------------- o ---------------------------------
>From Bob Harris at DEC
The Problem:
The panics are occurring because the AdvFS transaction log is exceeding
its safe recovery point. That is to say, AdvFS will not use more than
one half of its transaction log file so that in the event of a crash, it
will have the other half of the log to use for recovery purposes.
In this case AdvFS is performing a single atomic transaction which needs
to be completed before the log can be flushed and the space reused, but
the transaction is huge with lots of sub-transactions such that it is
consuming over half the log file. At this time we don't have any more
details than that.
AdvFS Engineering has been working on this problem for the past month
based on the a previous CLD and QAR (CLD MCPMB243A & QAR 41267).
....
--------------------------------- o ---------------------------------
Until a patch is made available running defragment or cp on a large highly
fragmented file in ADVFS exposes you to a system panic.
Here is my original message:
> Hi,
>
> I'd like to ask the list if anyone else has been seeing panics on DU 3.2c
> while running defragment. We have been able to generate these
> panics twice here on two different machines. Each time 'defragment' running
> on just one domain can cause the system to go nuts with the line:
>
> ADVFS EXCEPTION
> Module = 28, Line = 1646
>
> panic (cpu 0):
> syncing disks... done
>
>
> (O/S = 3.2c, firmware = latest, disk revision = latest)
> Naturally I've registered the problem with DEC but they seem unable to come
> up with a solution as yet and inform me that we are the only people
> experiencing this probelm. Is this true? Is anyone else out there seeing
> crashes running defragment?
>
> Here is the head of the stack trace:
>
> 1 panic(...)
> 2 advfs_sad(...)
> 3 release_dirty_pg(...)
> 4 lgr_writev_ftx(...)
> 5 log_donerec_nunpin(...)
>
> The process in memory at the time was 'defragment'.
> Also, as the domain that causes panics is built on LSM and the LSM volume
> is built on one disk only (rz1) - so we tried to swap this disk (even though
> uerf showed no SCSI errors) but the panics were still there when we tried
> defragmenting the new disk. Our domains typically contain large files
> (approx 1Gb in size).
>
>
> Any ideas? Any me too's?
Keith S McCabe
Unix Manager
Banque Paribas Capital Markets
London
W1
Received on Tue Jan 16 1996 - 13:40:57 NZDT