Thanks to all, who responded. Here's my original question followed by a
list of the answers (abbreviated):
======================<<QUESTION>>===================================
Hi people,
We had a crash on one of our alphas this morning and it seems that some
failure in the advfs caused the crash. I could see in the osf-mailing list
archives that it also has happend for others before (unfortunately without
any "SUMMERY" listed). I've heard
of two different patches for AdvFs ( OSFV30-018 and OSFV30-076), but the
question is that this "bug" is one of those, which these patches fix?
A short dbx run on the crash dump follows. It seems that line 752 in a file
named "subr_prf.c" has caused the panic message.
Thanks in advance,
-Farhad.
# dbx -k vmunix.0 vmcore.0
dbx version 3.11.6
Type 'help' for help.
thread 0xffffffff8122af00 stopped at [boot:1499 ,0xfffffc00004d91e8] Source not available
(dbx) sh strings vmunix.0 | grep '(Rev'
DEC OSF/1 V3.0B (Rev. 358.78); Sat Apr 1 11:54:27 MET DST 1995
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(dbx) p panicstr
0xfffffc00005e6690 = "advfs inconsistency"
^^^^^^^^^^^^^^^^^^^^
(dbx) t
> 0 boot(0x0, 0x0, 0xfffffc00005e6690, 0xfffffc0000722000, 0xb55) ["../../../../src/kernel/arch/alpha/machdep.c":1499, 0xfffffc00004d91e8]
1 panic(s = 0xfffffc00005e6690 = "advfs inconsistency")["../../../../src/kernel/bsd/subr_prf.c":752, 0xfffffc000043dc44]
^^^^^^^^^^^^^^^
2 advfs_sad(0x29, 0x223, 0xfffffc0000639838, 0x0, 0x0) ["../../../../src/kernel/msfs/bs/bs_misc.c":365, 0xfffffc00003dd374]
3 bs_osf_complete(bp = 0xffffffff87bbc300) ["../../../../src/kernel/msfs/osf/msfs_io.c":547, 0xfffffc00004035c8]
4 msfs_async_iodone_lwc() ["../../../../src/kernel/msfs/osf/msfs_io.c":694, 0xfffffc00004039c4]
5 lwc_schedule(0xfffffc000063ac60, 0xfffffc0000534bf0, 0xfffffc00006d44c8, 0xfffffc0000614488, 0xfffffc000046db40) ["../../../../src/kernel/bsd/lwc.c":212, 0xfffffc000024aaf4]
6 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1522, 0xfffffc000046d8d8]
7 xpt_callback_thread() ["../../../../src/kernel/io/cam/xpt.c":2228, 0xfffffc0000534d94]
(dbx) quit
#
============================================================================
=====================<<ANSWERS>>=======================
Jacy Canute suggests:
This and future crashes can be avoided by running the ADVFS defragmenter
set up an entry in cron to defragment your domains during your systems idel
times. For example, I defragment my domains at 3am every Sunday morning. That
eliminates any/all ADVFS file inconsistencies...
------------------------------------------------
Ray Bellis suggests, that we move the volume where the crash has originated
from to another domain:
..... If you've got the ADVFS
utilties (and a spare disk) you can take the system to single user
mode and then use `rmvol' which will transfer the contents of that
volume to the other volumes in the domain.
Ray.
ps, you'll need to run `/sbin/bcheckrc' (to mount your disks) and
`lmf reset' (to install the ADVFS-UTILTIES pak) once you're in single
user mode.
---------------------------------------------------
Saul Tannenbaum writes:
> A short dbx run on the crash dump follows. It seems that line 752 in a file
> named "subr_prf.c" has caused the panic message.
Actually, no. The subr_prf is part of the panic process, not the cause
of the error. The "advs_sad" routine is called when you see an
inconsistency, so the problem was actually detected in "bs_osf_complete".
...
We've suffered through many, many advfs panics but this stack trace
doesn't resemble any we've seen.
One thing I would check first is for disk errors. Advfs does not recover
gracefully from disk problems and is known to be very fragile when run
on bad hardware.
-------------------------------------------------------
Special thanks to Torbj|rn Lindgren with his very fine answer:
...
Basically there are two known ways of cleaning an AdvFS filesystem
(there may be more, but these are the one that are widely known):
1. vdump -> vrestore on an *UFS* filesystem! Then you can move it back
again (after recreating the domain).
2. Use the tools in /usr/field (se technote below). They seems to very
good at detecting inconsistencies, but only mediocre at correcting the
damage. Running msfsck and then vchkdir (running msfsck repeatedly
until the output doesn't change, then use vchkdir) might fix it.
At least in some cases the damage seems to concern the whole domain,
so you might need to move all filesets (if you have more than one in
the domain).
That machine used OSF/1 2.1, so that may not be needed any longer. The
same *may* apply to the need of a temporary UFS filesystem (the damage
seemed to follow vdump/vrestore if you dumped to a AdvFS fileset).
--
ADVFS Utilities in /usr/field
msfsck
This is the ADVFS bitfile-subsystem metadata structure checker. It verifies
low-level meta-structures like the BMT, storage bitmap, and tag directories.
....
vchkdir
This is the ADVFS directory structure checker and fixer. It verifies that
the directory structure is correct and that all directory entries reference
a valid file (tag) and that all files (tags) have a directory entry.
....
shfragbf
This program displays information about a fileset's Fragment File. The
Fragment File contains file fragments less than 8K.
....
tag2name
This program will display the full pathname of a file when only the
file's tag (inode) number is known.
....
switchlog
This program provides the capability to resize the transaction log
or to move it to a specific volume in a domain.
.....
vods
Displays the BMT on-disk structure.
....
-----------------------------------------------------------
Some others ahve suggested to install patches and/or upgrade to 3.2A. Thanks
again to all who responded:
Ray Bellis <Ray.Bellis_at_psy.ox.ac.uk>
Hellebo Knut <bgk1142_at_bggfu2.nho.hydro.com>
"Richard L Jackson Jr" <rjackson_at_portal.gmu.edu>
Dave Cherkus <cherkus_at_UniMaster.COM>
Jacy Canute <jacy_at_fluid.mro.dec.com>
Saul Tannenbaum <stannenb_at_emerald.tufts.edu>
Torbj|rn Lindgren <tl_at_ae.chalmers.se>
-Farhad Dehghani
Received on Thu May 04 1995 - 05:43:42 NZST