[SUMMARY] AdvFS problem

From: Bob Jackson <bobj_at_soc.duke.edu>
Date: Tue, 17 Nov 1998 11:03:24 -0500 (EST)

Many thanks to the following for timely suggestions and guidance:
        Dr. Tom Blinn <tpb_at_doctor.zk3.dec.com>
        Serguei Patchkovskii <patchkov_at_ucalgary.ca>
        alan_at_nabeth.cxo.dec.com (Alan Rollow)
        C.Ruhnke <i769646_at_smrs013a.mdc.com>
        Andrew Gallatin <gallatin_at_cs.duke.edu>

In response to my report of an AdvFS exception condition (see original
post below), several suggested that /sbin/advfs/verify might work or had
worked for them in similar situations. Tom Blinn suggested that the
easiest course of action (assuming there was no evidence of physical
damage) would be to remove the domain, recreate it and restore from
backup. The following course of action has resolved the problem:

1. The domain consisted of two 4.2GB drives divided into two filesets. By
dismounting the fileset that I suspected to be problematic, I was able to
keep the system from going into a panic state.

2. Went to single-user mode, fsck'ed root, made root writable and ran
/sbin/advfs/verify on the domain. This did not work and again there was a
system panic.

3. "uerf -R -o full | less" had revealed no evidence of device errors.
Ran a scu verify media on both drives and both came up clean, so was
reasonably confident that the drives were not physically damaged.

4. Proceeded with re-creation and restoration of the domain. Cleaned out
all files from /etc/fdms.
        Remade the domain: mkfdmn /dev/rz2c buddha_dmn1
        Remade the filesets: mkfset buddha_dmn1 home_fs
                                mkfset buddha_dmn1 data_fs
        Added 2nd volume: addvol /dev/rz3c buddha_dmn1
        Used dtadvfs to mount then set hard and soft limits

5. Restored data from backups.

   In the course of doing the restoration, I think I gained some insight
into why this situation might have occurred. During the period this
problem arose another workstation had been experiencing NFS problems and
required a reboot. The NFS problem had stalled nightly backups, which
resumed after the reboot. A user had been working in the AdvFS fileset
when the system panicked. After reboot of the other machine, I believe
some deadly combination of backup and user activity impacted the fileset
simultaneously, causing corruption. On an incremental backup tape I found
data sets created and backed up at about the time the problems arose.

      

******* Original Post ********************
Hello Managers,
   The following AdvFS exception reared its ugly head on one of our
systems today. The system will come up in multiuser mode for a few
minutes then panic. I found in the archives a reference to an identical
problem described in Dec 1997 in which the poster indicates a resolution
that involved removing the fileset and restoring from backup to the same
location. He was seeking clarification on possible causes and alternative
ways of handling, but there was never a summary. Is the approach he took
the best one and am I correct in interpreting this error to mean that the
filesystem is corrupted but the media is likely okay?
   The system in question is an AlphaStation 500 5/333 running DU 4.0B at
Patch Level 6.
   Thanks much.

Nov 14 13:49:10 buddha vmunix: ADVFS: using 3491 buffers containing 27.27
megabytes of memory
Nov 14 14:26:47 buddha vmunix: ADVFS error: alloc_mcell: invalid free list
Nov 14 14:26:47 buddha vmunix: ADVFS cont : alloc_mcell: vol = 1, page =
78
Nov 14 14:26:48 buddha vmunix: ADVFS cont : alloc_mcell: freeMCellCnt = 4,
nextFreeMCId == (0.0)^G^G^G^G^G^G^G
^G^G^G
Nov 14 14:26:48 buddha vmunix:
Nov 14 14:26:48 buddha vmunix: ADVFS EXCEPTION
Nov 14 14:26:48 buddha vmunix: Module = bs_bmt_util.c, Line = 3038
Nov 14 14:26:48 buddha vmunix: alloc_mcell: bad mcell free list
Nov 14 14:26:48 buddha vmunix: panic (cpu 0): alloc_mcell: bad mcell free
list
Nov 14 14:26:48 buddha vmunix: syncing disks... done
Nov 14 14:26:48 buddha vmunix: device string for dump = SCSI 0 9 0 1 100 0
0.
Nov 14 14:26:48 buddha vmunix: DUMP.prom: dev SCSI 0 9 0 1 100 0 0, block
1413388
Nov 14 14:26:48 buddha vmunix: device string for dump = SCSI 0 9 0 1 100 0
0.
Nov 14 14:26:48 buddha vmunix: DUMP.prom: dev SCSI 0 9 0 1 100 0 0, block
1413388


+-[Bob Jackson]------------[Email: bobj_at_soc.duke.edu] ---------------------+
| Phone: (919) 660-5601 Fax: (919) 660-5623 |
| Affiliations: Department of Sociology & |
| Office of Information Technology |
| Office: 141 Sociology/Psychology Building |
| Postal Address: Department of Sociology, Box 90088, |
| Duke University, Durham, NC 27708-0088 |
+--------------------------------------------------------------------------+
Received on Tue Nov 17 1998 - 16:04:05 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT