SUMMARY2: Q: what to do with "log half full" panic

From: Niels Kokholm <kokholm_at_math.ku.dk>
Date: Mon, 03 Apr 2000 09:20:17 +0200 (MET DST)

As to my supplementary questions in the SUMMARY:

I got replies from
simon.millard_at_barclays.co.uk
"Davis, Alan" <Davis_at_tessco.com>
"Fliguer, Miguel" <M_Fliguer_at_unifon.com.ar>

1) The recognition of the third raid set w/o booting the new kernel is due
to DDR, which has been in DU since 4.0. (Only about 4 years, I guess :-).

2) Nothing conclusive on the reasons for the speedup of vdump. Miguel
Fliguer considered the TZ87 to be very slow and recommended the TZ89. I am
going to try to find the money for that.

Yours

   Niels Joergen Kokholm | email: kokholm_at_math.ku.dk
   Institut for Matematiske Fag | phone: +45 3532 0759/+45 2128 6932
   Universitetsparken 5 | fax: +45 3532 0704
   DK-2100 Kobenhavn OE, Denmark | www: http://www.math.ku.dk/~kokholm

On Wed, 22 Mar 2000, Niels Kokholm wrote:

> I got replies from
>
> Bevan Broun <bevanb_at_ee.uwa.edu.au>
> "Fliguer, Miguel" <M_Fliguer_at_unifon.com.ar>
> "Aviles Aviles, Mario" <Mario.Aviles_at_sonda.com>
> Steve Hancock <shancock_at_zk3.dec.com>
> Claudio Tantignone <C_Tantignone_at_sondaarg.com.ar>
> Richard Jackson <rjackson_at_portal.gmu.edu>
>
> Most recommended (with detailed instructions how) to increase the
> advfs log size with the undocumented -l argument to
> /sbin/advfs/switchlog. This implies adding an extra disk (perhaps only
> temporarily) to the advfs domain.
>
> My local Compaq Support Repr. showed me how to find the name of the
> offending domain using kdbx on the crash dump. Then I managed to get
> hold of two unused SBB disks and room in the storage works
> shelves. Using storageworks command console i built a RAID 1 set (re3)
> on those disks (while the system was up in multiuser mode). I added
> re3c to the problematic advfs domain and switched logs to re3c:
> /sbin/advfs/switchlog -l 2048 export 2
> Then I crossed my fingers and ran defragment on the domain without
> problems.
>
> Two supplementary questions:
>
> 1) After having built the new RAID set, and having built and installed
> - but not rebooted - a new kernel with the additional line
>
> device disk re3 at xcr0 drive 3
>
> I added the re3 devices with MAKEDEV, and tried to add the volume re3c
> to the advfs domain. I expected that I would have to reboot the new
> kernel before I could use re3, but the volume was added and used
> without complaints. What am I missing?
>
> 2) Now a complete vdump of the domain (to a TZ87) takes 6 hours as
> opposed to 7.5 hours before adding the disk. Before I had 1Gb out of
> 16 Gb free now 3 Gb out of 18Gb free. In the mean time I also doubled
> RAM from 256Mb to 512 Mb.
>
> Should I attribute this speedup to adding the disk or to the extra memory?
> Does 6 hours sound reasonable to you?
>
> Yours,
>
> Niels Joergen Kokholm | email: kokholm_at_math.ku.dk
> Institut for Matematiske Fag | phone: +45 3532 0759/+45 2128 6932
> Universitetsparken 5 | fax: +45 3532 0704
> DK-2100 Kobenhavn OE, Denmark | www: http://www.math.ku.dk/~kokholm
>
> On Tue, 21 Mar 2000, Niels Kokholm wrote:
>
> >
> > Yesterday we had the experience of a panic of our Alphaserver 1000A
> > running DU4.0d with jumbo patch 3 (the latest Advfs patch installed seems
> > to be Patch 0392.01 - AdvFS Consolidated Patch).
> >
> > /var/adm/messages says
> >
> > Mar 19 15:08:53 abel vmunix: ADVFS EXCEPTION
> > Mar 19 15:08:54 abel vmunix: Module = ms_logger.c, Line = 2005
> > Mar 19 15:08:54 abel vmunix: release_dirty_pg: log half full
> > Mar 19 15:08:54 abel vmunix: panic (cpu 0): release_dirty_pg: log half
> > full
> >
> > dia also simply reports the "release_dirty_pg: log half full".
> >
> > According to /var/adm/crash/crash-data.0 defragment was running at the
> > time of the crash (started at 4:00). There is a core file in / from kdbx,
> > which seems to have had a segmentation fault while running at boot after
> > the crash.
> >
> > At least some of the advfs filesystems on the box were created January
> > 1996, probably under du 3.2c. Most filesystems have little free space. The
> > volumes are on a SWX RAID controller (230).
> >
> > I would like your advice on the following:
> > Due to the tight disk space we are going to install a RA3000 instead
> > of the swxcr230 within a few weeks. In the course of this all the advfs
> > domains will, of course, be recreated.
> > 1) Should I panic now myself, stop the server, recreate the filesystems
> > on the existing volumes and restore from tape. Or should I just stop
> > running defragment for the next few weeks and hope nothing else triggers
> > the "log half full" panic before the new disksystems are installed?
> > 2) Any newer patches I should install ASAP?
> > 3) Should it be possible from the crash data to find out which file
> > domain was the reason for the crash?
> > 4) (This has not much todo with the crash) On the RA3000 I plan to use a
> > 2x4Gb disks in a RAID 1 set for system disk and 5x18Gb in a RAID 5 set for
> > user files to be exported via NFS and Samba. At the outset the RAID 5
> > set would be one big advfs domain split into a couple of filesets.
> > Would it be beneficial for (write) performance on the user files to use
> > some of the space on the RAID 1 set for the Advfs log file?
> >
> > Yours
> >
> >
> > Niels Joergen Kokholm | email: kokholm_at_math.ku.dk
> > Institut for Matematiske Fag | phone: +45 3532 0759/+45 2128 6932
> > Universitetsparken 5 | fax: +45 3532 0704
> > DK-2100 Kobenhavn OE, Denmark | www: http://www.math.ku.dk/~kokholm
> >
> >
> >
> >
> >
>
>
Received on Mon Apr 03 2000 - 07:21:07 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:40 NZDT