AdvFS/ASE crashes

From: Knut Hellebų <Knut.Hellebo_at_nho.hydro.com>
Date: Tue, 09 Dec 1997 20:23:42 +0100

Regards managers,

We have had several crashes on our ASE members lately. In kern.log we
have something like

Dec 9 15:36:36 asemember vmunix: ADVFS EXCEPTION
Dec 9 15:36:36 asemember vmunix: Module = bs_bmt_util.c, Line = 3038
Dec 9 15:36:36 asemember vmunix: alloc_mcell: bad mcell free list
Dec 9 15:36:36 asemember vmunix: panic (cpu 0): alloc_mcell: bad mcell
free list
Dec 9 15:36:36 asemember vmunix: syncing disks...

and in the crash-data file we have

    cpu_panicstr = 0xffffffff8d6170b0 = "alloc_mcell: bad mcell free
list"

along with

> 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00
002c348c]
   1 thread_preempt(thread = 0x26, processor = 0xfffffc0000200100)
["../../../..
/src/kernel/kern/sched_prim.c":3820, 0xfffffc00002c5cf4]

> 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00
002c348c]
   1 thread_preempt(thread = 0x26, processor = 0xfffffc0000200100)
["../../../..
/src/kernel/kern/sched_prim.c":3820, 0xfffffc00002c5cf4]
   2 boot(0x0, 0xfffffc0001fb7340, 0x2c0000002c, 0x2a, 0x300000001)
["../../../.
./src/kernel/arch/alpha/machdep.c":2572, 0xfffffc0000498718]
   3 panic(s = 0xffffffff8d6170b0 = "alloc_mcell: bad mcell free list")
["../../
../../src/kernel/bsd/subr_prf.c":791, 0xfffffc00002945cc]
   4 advfs_sad(0xfffffc00005b0590, 0xffffffff8d6171c8, 0x0,
0x37330a3037203d20,
0xa0000000000) ["../../../../src/kernel/msfs/bs/bs_misc.c":504,
0xfffffc000032fd
38]
  5 alloc_mcell(0xfffffc00061b6e08, 0xffffffff010c0004,
0xfffffc00024c6000, 0xf
fffffff8d617370, 0xffffffff8d617368)
["../../../../src/kernel/msfs/bs/bs_bmt_util.c":3038,
0xfffffc00003120e0]
.
.
.

The problem has been "fixed" by running '/sbin/advfs/verify', deleting
the bad fileset (verify and the system crashed when scanning this
fileset), recreate it and restore the data.
Could this have been avoided, eg could we have disabled this fileset
from being used by ASE before ASE was up and running so that all
services/filesets except the bad one could be made available?
What caused the above crash ?

The servers are DEC3000 4.0B + patchset5 + Trucluster 1.4.1 + Trucluster
patches

-- 
      ******************************************************************
      *         Knut Hellebų                     | DAMN GOOD COFFEE !! *
      *         Norsk Hydro a.s                  | (and hot too)       *
      * Phone: +47 55 996870, Fax: +47 55 996495 |                     *
      * Cellular Phone: +47 93092402             |                     *
      * E-mail: Knut.Hellebo_at_nho.hydro.com       | Dale Cooper, FBI    *
      ******************************************************************
Received on Tue Dec 09 1997 - 20:26:57 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT