Thanks to all who responded:
sxkac_at_java.sois.alaska.edu (Kurt Carlson)
Jim Neeland <neeland_at_madmax.hrl.hac.com>
Original at end of this message.
Simply, following one or two power outages, one of our OSF/1 v3.2 (214) boxes wouldn't
boot all the way to multiuser, with the ADVFS error mentioned.
Explanations and solutions: ADVFS was too flakey at this version level and such
problems were not uncommon as result of things like a power failure. All recommended
to either get appropriate ADVFS patches from Digital or maybe better yet, upgrade OS as
far as possible (at least to 3.2g). I determined that the usr file set was corrupted
and restored that from tape. Will upgrade at a more convenient time.
Kurt Carlson has done wonderfull things with consolodated patches and generating tools
to work with them, and other ADVFS utilities. (They can't be paying this guy enough.)
I'll include his comments while hoping that his server won't get nailed as a
consequence:
**************************************************************************************
from Kurt Carlson:
There should be a patch kit for 3.2, lots of advfs stuff was broken
way back then... I'd encourage you to get to 3.2g. If you can't do
that, look for a patch.
You can get _complete_ 3.2 patch kit via anonymous ftp:
atlanta.service.digital.com:/pub/patches/osfv32/patches.tar.Z
however, that'll be a huge file (likely 20mb+) with *ALL* 3.2 patches
and no tools to doconsolidate what you want. It will probably be
easier for you to got to v3.2g then deal with that. If you choose
to deal with that, you can find tools to deconsolidate and apply
individual patches under anonymous ftp (Digital doesn't supply anything):
raven.alaska.edu:/pub/sois/UA_DUtools.tar.Z
The patches.tar.Z file is not documented by Digital, it is ultimately
being replaced by www.service.digital.com... try that, they may finally
have v3.2 out there.... some of the more important patches were there
last time I looked (early January). UA_DUtools has some other things
besides the deconsolidator... including the advfs.utilities document
attached at the end).
If you have a support contract they will isolate the specific patch(es)
for you to ftp from atlanta. If you don't you're stuck with one of
the above. kurt
* * *
sxkac_at_nugget> cat /usr/local/doc/unix/advfs.utilities
951017kc unix/advfs.utilities Page 1 of 2
______________________________________________________________________________
Date: Mon, 25 Sep 1995 08:42:01 +0400
From: Martin Moore <martin_at_jerry.alf.dec.com>
Subject: Unsupported AdvFS utilities
TITLE: [ADVFSOSF] Descriptions of Utilities in /usr/field
OP/SYS: DEC OSF/1 Version 3.0 onwards
SYMPTOMS: What are the utilities in /usr/field.
ANALYSIS: These are not documented anywhere else.
SOLUTION: This is a description of the ADVFS v3.0 of these programs.
Earlier versions may not support all these features (or they
may not even exist on earlier version of ADVFS).
______________________________________________________________________________
msfsck
------
This is the ADVFS bitfile-subsystem metadata structure checker. It verifies
low-level meta-structures like the BMT, storage bitmap, and tag directories.
The file domain must be inactive to run msfsck. You also need at least
one mounted fileset (this is because msfsck uses the .tags directory in
the fileset to access the metadata).
To run it, first 'cd' to the mount point of a mounted fileset.
Then, run "/usr/field/msfsck -t <domain-name>".
vchkdir
-------
This is the ADVFS directory structure checker and fixer. It verifies that
the directory structure is correct and that all directory entries reference
a valid file (tag) and that all files (tags) have a directory entry. The
-f flag will create symlinks in "<mount-point>/lost+found/" to all files
(tags) that do not contain a directory entry; these are called lost files.
The -f flag also remove 'dead' directory entries (ones that do not point
to valid tags).
The -d option will delete lost files and it will delete corrupted
directories. Note, that you may need to run vchkdir several times
to cleanup a fileset.
The file domain must be inactive to run vchkdir. The fileset to be
checked/fixed must be mounted.
To run it do "/usr/field/vchkdir <mount-point>".
shfragbf
--------
This program displays information about a fileset's Fragment File. The
Fragment File contains file fragments less than 8K. These are used to
minimize wasted disk space due to internal file fragmentation (for example,
ADVFS will store a 1 byte file in a 1K fragment rather in a 8K page).
The Fragment File is always tag (inode) 1 in a fileset and can be accessed
via the fileset's .tags directory.
To run it do "/usr/field/shfragbf <mount-point>/.tags/1".
951017kc unix/advfs.utilities Page 2 of 2
______________________________________________________________________________
tag2name
---------
This program will display the full pathname of a file when only the
file's tag (inode) number is known. This is mainly a debugging aid
when msfsck or vchkdir report errors for specific tags.
To run it do "/usr/field/tag2name <mount-point>/.tags/<tag-number>".
switchlog
---------
This program provides the capability to resize the transaction log
or to move it to a specific volume in a domain.
NOTE: To date there has been no reason to change the size of the
transaction log so we do not recommend doing this.
To move the transaction log to another disk do
"/usr/field/switchlog <domain-name> <new-volume-number>".
Use showfdmn to determine the current volume that contains the log
and to determine a suitable target volume.
switchlog can be used on an active system.
mssh
----
ADVFS testing shell. Not terribly useful anymore.
vods
----
Displays the BMT on-disk structure. It is beyond the scope of this note
to describe this utility as it requires intimate knowledge of the BMT
structure to use and interpret the output of 'vods'. It is mainly
a low-level debugging tool.
______________________________________________________________________________
showfile
--------
A documented (see 'man showfile') utility for fragmentation:
sxkac_at_glacier> showfile /*vmunix*
Id Vol PgSz Pages XtntType Segs SegSz Log Perf File
ba5.8003 1 16 920 simple ** ** off 95% genvmunix
236.8008 1 16 989 simple ** ** off 73% vmunix
23f.8005 1 16 947 simple ** ** off 78% vmunix.bad
252.8007 1 16 989 simple ** ** off 49% vmunix.bad_2
_____________________________________________________________________
Kurt Carlson, University of Alaska SOIS/TS, (907)474-6266
sxkac_at_alaska.edu 910 Yukon Drive #105.63, Fairbanks, AK 99775-6200
**************************************************************************************
Original Message:
>
> Alpha 3000/600 _at_ OSF v3.2
> root, usr, and a data filesystem are ADVfs
>
> We had a power failure last Fri. evening and Sat. morning while all were
> out of the office. Later Sat., one of the Alphas wouldn't complete a
> boot. After having completed ADVfs mounts, it fails initing to
> multiuser at script /rc2.d/K09snmpd, wherein the failure is at the
> execution of /usr/sbin/rcmgr to get COMMON_AGENT_CONF. Well, further
> testing reveiled this:
>
> 1. boot to single user and mount advfs filesets with "# bcheckrc".
> 2. /sbin/find on root and the data filesystems work fine
> 3. /sbin/find on usr filesystem fails and goes to reboot of system but I
> can't get the error mssgs before screen is refreshed for reboot.
>
> 4. from singleuser, /sbin/bcheckrc mount ADVfs filesets; do a man on
> /usr/sbin/rcmgr, can get following:
> ADVFS error: alloc_mcell: mcell (2.1394) not really free
> ADVFS cont: alloc_mcell: vol=1, page=1394
> ADVFS cont: alloc_mcell: tag=0x00003f33.8066,setTag=0x00000001.8001
>
> ADVFS EXCEPTION
> Module-3, Line=2648
>
> panic (cpu0)
> syncing disks .. 1 1 1 1 1 1 1
>
> What does this mean?
> a. Does this have anything to do with a hard disk failure on the usr
> partition? (same disk as root)
> b. Or is this related to an ADVfs problem requireing a patch, say Patch
> ID: OSF360-350163, mentioned by Drew Ibbotson <wwibbd_at_itwhy.bhp.com.au>
> last August (I can't seem to locate the patch so as to read the README)
>
> Thanks in advance.
>
> PS, where in the documentation is there help on ADVfs errors and
> troubleshooting? I really don't like to hear those black-box-like
> statements about ADVfs problems. They always seem to be, paraphrasing,
> "If ADVfs can't fix itself, you'll just have to write off that disk".
>
> -Neil
> --
> Neil R. Smith, Res. Assoc./Sys. Admin. neils_at_csrp.tamu.edu
> Dept. Meteorology, Texas A&M Univ. 409/862-4342
--
Neil R. Smith, Res. Assoc./Sys. Admin. neils_at_csrp.tamu.edu
Dept. Meteorology, Texas A&M Univ. 409/862-4342
Received on Tue Mar 18 1997 - 00:41:07 NZST