SUMMARY (Partial) : Power outage corruption on an AdvFs raid set

From: <POLLI_at_axjet.lnf.infn.it>
Date: Tue, 29 Aug 2000 15:52:37 +0000 (GMT)

First, thanks to (in arrival order):

Bryan.Lavelle_at_compaq.com
alan_at_nabeth.cxo.dec.com
BochnikWJ_at_bernstein.com
Nikola.Milutinovic_at_ev.co.yu

and to
szemgy_at_rkk.hu
that I contacted personally after another search in the archives.

Unfortunately I didn't succeed. I recreated the disk label, but the
fileset remaind corrupt.

Bryan.Lavelle_at_compaq.com said:
-----------------
If I understand what you want to do, place the correct disklabel on rz1,
and
hopefully have your data. I've been successful in doing this, providing
you
know exactly what the disklabel was. Your mileage may vary, of course.
If
you don't know what the disklabel was, and you have ever run sys_check,
look
for a directory call /var/recovery and look for that disk's disklabel in
that tree. There are no guarantees that this will work. There are never
good reasons for not doing backups, btw.
----------------

I forgot to say that I still have a 3.2C on that computer. After the
"fact", I swapped computer and connect the whole bunch of disk to a 4.0E
system. Two of the three raid sets came back to life. After this, I
asked for help, here..

BTW. I didn't say I had *no* backup. Only it dates before summer.
Because of personal problems, I couldn't do any backup in the
meantime...

alan_at_nabeth.cxo.dec.com said:
----------------
        It is unlikely that the disklabel was the only thing lost in
        the power failure of the array. Your best bet is use the
        salvage command and find a place to put the data that it
        can recover.
---------------

And I was well aware of this, but "hope is the last to die"...
More on this, later..

BochnikWJ_at_bernstein.com said:
----------------
boot into the swxcr management floppy and check the raid volume - it
might
just be offline and need to be brought online (hopefully w/o any
corruption)
---------------

First thing I did.. all the disks in the raid set (6 RZ29) were OK. OPT,
the program said..

Nikola.Milutinovic_at_ev.co.yu said:
-----------------
[skipping my text]
> # disklabel -r re1
> Disk is unlabeled or, /dev/rre1a is not in block 0 of the disk

If your disklabel is gone, what's with the rest of the disk? I mean,
even in the case of a power failure, there should be no damage to the
disklabel. Try to check the consistency of the drive. You should have a
AlphaBIOS utility on a diskette, SWXCRMGR.EXE or RA200RCU.EXE. Run it
from ARC console and check your drive(s).

> Now what can I do to save all the data in the disk?

Might be nothing. Might be just re-applying the disklabel.

> I have a feeling, after having read the answers in the archive and the
> man page, that writing the *right* label on the disk I could recover
the
> fileset.

If the file domain is undamaged. If you know the old disklabel, try
re-applying it. It cannot hurt.

> I don't want to lose the data. Yes, I know! Do a periodic backup. But
I
> couldn't, so don't flog me. ;-)

So, you have somebody else to flog? Great!
-----------------

Confirming the other mail's I had..


Well. Fist thing was to apply the label:

disklabel -rR -t advfs re1 re0-label SWXCR

where re0-label is the label of the other, working, raid set, built with
the same RZ29 disks.
Now I could mount the disk.

mount DataF2#dataf2 /dataf2
but...
DataF2#dataf2 on /dataf2: Device does not contain a valid ADVFS file
system

The I tried:

/sbin/advfs/verify DataF2
verify: can't get set info for domain 'DataF2'
verify: error = E_BAD_MAGIC (-1167)
+++ Domain verification +++

main: unable to get info for domain 'DataF2'
    error: -1167, E_BAD_MAGIC (-1167)

Then:

/sbin/advfs/advscan -r re1

Scanning devices /dev/rre1

Found domains:

and, as a last resort..

cd to some dir
/sbin/advfs/salvage -V /dev/re1c
salvage: Volume(s) to be used '/dev/re1c'
salvage: Files will be restored to '.'
salvage: Logfile will be placed in './salvage.log'
salvage: No valid volumes found

Am I at a dead end??

I looked in the archives, but found very few problems about
E_BAD_MAGIC.. One gave me no hope, but it was on a 3.2C system that has
very little (or no) recovery programs...

I'll wait a few days, in case someone has any ideas.. In the meantime,
I'm backing up the working filesets..

Thanx to all.

        Ermanno Polli
        INFN-LNF
        ermanno.polli_at_lnf.infn.it
Received on Tue Aug 29 2000 - 13:54:58 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT