PARTIAL SUMMARY: Recovering partly destroyed advfs stripe-set.

From: Tobias Ernst <tobi_at_physcip.uni-stuttgart.de>
Date: Wed, 14 Feb 2001 13:12:02 +0100

Hallo!

The problem was:
----------------

- Tru64 Unix 5.0 Revision 910.
- ADVFS domain consisting of three disks (mkdfmdn, addvol, addvol).
- The very first 5 MB of the first disk (the one containing the transaction
   log) got zeroed out through unexperienced usage of the scu (raw scsi
   access) utility.
-> The machine crashed and on reboot, the domain was inaccessible.

We have recovered most of our data now, but there are still some unclear
topics, so any experts please read on ;-).

Responders:
-----------

While we got quite some mails from admins offering their condolences to us
and expressing the fear that all data were lost, we now know that this is
fortunately not true. We got helpful input especially from Steve Hancock
<shancock_at_zk3.dec.com> and from alan_at_nabeth.cxo.dec.com. They pointed us to
the very useful utility called "salvage" with the "-S" option, but this was
still not enough. We figured out quite some other things on our own and now
have recovered nearly everything, but there are still some questions open.
That's why I call this a "partial summary", and I still hope to get some
more input from you.

Basics:
-------

(If you know these, please skip this section and read on below, where the
"Remaining Questions" will follow ...)

When you have an ADVFS domain consistung of multiple disks, you do not have
neither RAID0, RAID1 nor any other form of RAID. If ADVFS would have been a
RAID0, we would have been indeed out of luck. But ADVFS has a different
approach:

On ADVFS, a single file is always stored on a single disk, unless the file
is either larger than a single disk, or unless the user explicitly uses the
"stripe" command to distribute the file across multiple disks. Thus, ADVFS
does not offer a general performance gain like RAID0, but it still offers a
performance gain on a multi user server (probably every Tru64 box is such a
server), because as soon as many users access many files, again all three
disks get approximately the same amount of I/O load.

The advantage of the approach of ADVFS in comparison to RAID0 is its
robustness against data loss. One single ADVFS disk contains all information
that is required to extract that part of the files that were (completely)
stored on it, even if the other disks have vanished. On the beginning of
each disk, there is a) a disklabel (obviously), and then inside each
slice/partition which is in use by AdvFS, there is an AdVFS signature and
the ADVFS metadata, followed by the raw blocks containg the file data.

When dark things happen to your disk, you will encounter the situation that
Tru64 will refuse to mount the domain in order to avoid further logical
damage to it. There are quite some recovery utilites which can fix smaller
errors on the domain (there are other good summaries about those on this
list), however sometimes the problems are worse.

For this case, there is the "salvage" utility. This utilty is NOT able to
fix the domain, in fact it does not even modify anything on a crashed
domain. But it can extract files from the crashed domain to anywhere else.
It even has advanced features like combining extracts with an (probably
out-of-date) backup. You can restore the backup to the place where salvage
extracts to, and then specify special salvage options so it will only
recover those files from the crashed domain which are not on the backup or
which are newer than their counterparts on the backup.

So the first step when your disk has crashed that badly is to get temporary
disk space with as much free space as was previously occupied on your now
broken domain. In our case, we used a different machine and mounted the disk
space via nfs on our machine with the broken filesystem (be sure to use a
Tru64 box as NFS server and enabling proplistd etc. when doing this, so that
salvage can also extract advfs specifcs like acl's).

Now, in most cases you will still have the disklabel, the advfs signature
and the advfs metadata on your disk. In this case, you can first try to run
salvage without any special options (by default, it extracts anything it
finds to the current working directory, so set that one appropriately).

When your metadata is broken, you can run salvage with the "-S" option, in
which case it will scan the complete raw data area of the disk in order to
find all files. Fortunately, ADVFS distributes many redundant information
about the blocks over the disk, so that even in this case salvage can
restore file names and the directory hierarchy appropriately.

Advanced Topics and Open Questions
----------------------------------

But in our case we managed to trash exactly the disklabel, the metadata and
the transaction log, so salvage would refuse to work because it did not see
any sort of AdvFS structures at all on this disk. So we somehow had to
prepare the beginning of the disk in order to bootstrap salvage into finding
something it can work on.

Steps we have taken:

1. Restore the disklabel

We used "disklabel -wr" to initialise a disklabel on the broken disk. Then
we used "disklabel -r" to read the disklabel of one of the intact disks
(which had the same geometry in our case) and "disklabel -R" to restore it
onto the broken disk.

2. Provide the AdvFS signature

Restoring the disklabel, unfortunately, was not enough. Salvage would still
not find anything on the disk. This is where our knowledge ceases and pure
empirical testing started.

We found out that in our case, the first 16384 bytes of the two intact disks
where 100% identical. So we used dd like this:

dd if=/dev/disk/dsk6c of=/dev/disk/dsk7c bs=16384 count=1

in order to duplicate those 16384 bytes from one of the intact disks onto
the broken disks.

3. Run salvage

After that, we were able to run salvage on the complete domain like

salvage -D /mnt/recover -S -p -l -L /mnt/recover/recover.log domain_name

and it recovered almost all files from the domain, and almost all with
correct names and structures, only three files or so were recovered
"partially", and only some dozens in lost+found or with names like "tag_..."
instead of the real names. This salvage run took about 12 Hours with 3 9.1GB
IBM DDRS hard disks, a PWS 500au with 640MB RAM where salvage ran, and a PWS
500au with 256MB RAM as NFS server for what was mounted on the first PWS as
/mnt/recover. It is recommended to have LOTS of RAM on the machine where
salvage runs, and don't forget to set "ulimit -d unlimited" and "ulimit -m
unlimited" before you run salvage, because salvage seems to build a complete
image of the logical structure of the domain in memory before starting to
extract anything. This process of image building takes some time, so don't
despair when salvage does not start extracting right away.

4. Questions

Before doing the steps above on our critical file systems, we experimented a
lot with other test disks, filled domains and trashed them at will, in order
to find out how to prepare the trashed disk so that salvage will run
successfully. We found out that the amount of data we had to "dd" from an
intact to the broken disk varied wastly from configuration to configuration,
and that either "dd"ing too few or "dd"ing too much will stop salvage -S
from working. In some cases, salvage -S would refuse to work, and in some
others, it would run for hours or even days but don't find anything.

I wonder if there is a way to initialise a disk where the first few mb's
were zeroed out in a more clean way than just dd'ing from another disk. Are
there tools to do this, or are the structures that must be placed there
documented at some place?


Final Thoughts
--------------

- Use a good backup device.
- Make daily backups. Make *complete* backups
- Including all file systems into the backup. You'll only know where you
  have placed important things when you have lost them.
- When you are not granted money for a backup device, either quit the job
  or request more payment to compensate for the frustration ;-).

Kind Regards,
Tobias Ernst.
Received on Wed Feb 14 2001 - 12:13:55 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT