Summary: Having a redundant ADVFS system disk

From: Steven Timm <timm_at_fnal.gov>
Date: Tue, 10 Apr 2001 16:31:06 -0500 (CDT)

Thanks to everyone for their quick answers on this one.

Here was my initial question:

>I currently have two 9 Gb disks available, one to use as a system
>disk and the second to use as a hot spare backup.
>There are six ADVFS domains on each one, plus a swap partition.
>I would like to, once a day via cron job, make copies of these
>partitions to the hot spare backup disk.
>
>My questions is the following:
>
>Is there a way to copy the files from the partition on
>/dev/rz2a to /dev/rz3a ( and so forth with the other partitions)
>in such a way that the ADVFS domain and
>file set names are preserved? (preferably with dd).
>
>In other words, I would like to be able, if /dev/rz2 is lost, to swap
>/dev/rz3 into its location, and boot my system to see the same
>domains and file sets that were there before.
>
>Thanks
>
>Steve Timm

Three categories of answer:

One--use LSM to mirror the disks.

>From: "Macfarlane, Fraser" <Fraser.Macfarlane_at_compaq.com>

>try using lsm to mirror the disks so if one fails you can still stay up
>
>using an advfs utilities licence may provide you with same capability but
>I would suggest that you use a hsz controller for mirroring or lsm.
(also
Elizabeth Harvey-Forsythe and alan_at_nabeth.cxo.dec.com suggested this.)

I do have an LSM license but it's not quite what I'm looking
for in this situation, because a daily disk backup (as opposed
to an instantaneous one) gives recourse against such things
as system manager stupidity, CD-ROM's failing in the middle
of upgrades, and so forth.


Alan added a caveat that applies no matter which way the backup
is done:
        The problem you'll have making the backup is making sure
        the file systems and domains are consistent when making
        the copy. Getting an inconsistent copy of the domains
        will likely prevent them from being mountable. Whether
        the consistency of the underlying files matters, depends
        on how you use the system.

        AdvFS will make it non-trival to mount the filesets on
        the copied domains when the original is still mounted.
        I think there is an option that allows this, so check
        the mount(8) manual page if it becomes a problem.

        The only good way I know of to make the domains consistent
        is to dismount all the file systems using them.


The second solution from John Francini,
which is what I am looking for, but must add the caveat that I haven't
tried it yet:

We do this every day on our systems here. We don't use dd; we use a
vdump/vrestore pipe to do the job, run as a cron file.

Before using it, you'll likely have to modify it to add sections for
the extra non-system mount points you have on your disk. Also,
you'll need to do the following steps once to prepare the clone disk:

1. Create directories of the form /clone_root, /clone_usr etc for the
mount points for each of the clone disk's file systems.

2. Prepare the clone disk as follows -- using your example of rz2 being the
original disk and rz3 being the clone:

        # disklabel -r rz2 > /tmp/rz2label
        # disklabel -z rz3
        # disklabel -R -t advfs rz3 /tmp/rz2label
        # mkfdmn -r /dev/rz3a clone_root <-- note the "-r" switch
        # mkfset clone_root root
        # mkfdmn /dev/rz3g clone_usr (assuming /usr is on g)
        # mkfset clone_usr usr

repeating the last two commands for each of the other 4 mountable
partitions (excluding swap, of course) on the disk.

Since the _fileset_ names are the same (root, usr, etc), when you
move the disk to the rz2 slot, it will boot and run normally with no
changes required on your part. I'm assuming that you're running
Tru64 UNIX V4.0x, and not V5.x. In V5.x there's more to be done
because of the new device database that matches drive serial numbers
against an internal table to ensure that a drive's name doesn't
change no matter what slot it's plugged into.

The script is included below. Note well: this is provided AS IS and
without warranty, and is not a supported product from Compaq Computer.

Hope this helps!

John Francini



#!/bin/sh

#-------------------------------------------------------------#
# File: clone_disk.sh (non-lsm version) #
# Auth: rengsys (Release Engineering Systems) #
# Date: 7/28/98 #
# Desc: Script to create a cloned copy of the system disk. #
# Script copies the root and usr filesystems to a spare #
# disk for failure recovery. #
# 12/31/1999 - M. Heslin Added Y2K support. #
# 10/17/2000 - John Francini - remake filesets instead #
# of using rm. It's faster and cleaner #
#-------------------------------------------------------------#

#--------------------#
# Basic necessities: #
#--------------------#
NOTIFY=foo_at_bar.com
HST=`hostname -s`
LOGDIR=/tmp/clone_disk_log
MSG=${LOGDIR}/status
DMPSTS=${LOGDIR}/dump_status
ERRCHK=${LOGDIR}/error_list
ROOTMNT=/clone_root
USRMNT=/clone_usr
STATUS=0

#----------------------------------------------#
# Check to see if the logging directory exists #
# if not, create it #
#----------------------------------------------#
[ ! -d ${LOGDIR} ] && `/bin/mkdir ${LOGDIR}`

#-------------------------------------#
# Clean up logs, files from last run: #
#-------------------------------------#
[ -f $ERRCHK ] && `/bin/rm $ERRCHK`
[ -f $DMPSTS ] && `/bin/rm $DMPSTS`
[ -f $MSG ] && `rm $MSG`

#--------------------------------#
# Check to see if the clone disk #
# is mounted - if not mount it: #
#--------------------------------#
case `df -k|grep -i clone_root|awk '{print $6}'` in
   /clone_root)
         ;;
   *)
         /sbin/umount clone_root#root > /dev/null 2>&1
         if [ ! -d $ROOTMNT ]
         then
                 /usr/bin/mkdir $ROOTMNT
         fi
         # Rather than doing rm -rf later, simply remake the fileset
         /sbin/rmfset -f clone_root root
         /sbin/mkfset clone_root root
         /sbin/mount clone_root#root $ROOTMNT
esac

case `df -k|grep -i clone_usr|awk '{print $6}'` in
   /clone_usr)
         ;;
   *)
         /sbin/umount clone_usr#usr > /dev/null 2>&1
         if [ ! -d $USRMNT ]
         then
                 /usr/bin/mkdir $USRMNT
         fi
         # Rather than doing rm -rf later, simply remake the fileset
         /sbin/rmfset -f clone_usr usr
         /sbin/mkfset clone_usr usr
         /sbin/mount clone_usr#usr $USRMNT
esac

#-------------------------------------#
# Check again to make sure everything #
# mounted okay - if not then quit: #
#-------------------------------------#
if [ `df -k|grep -i clone_root|awk '{print $6}'` -ne $ROOTMNT ] || [
`df -k|grep
  -i clone_usr|awk '{print $6}'` -ne $USRMNT ]
then
         echo "Couldn't mount clone disk - please investigate" >> $ERRCHK
         $STATUS=1
fi

#------------------#
# Start the dumps: #
#------------------#
if [ $STATUS -eq 0 ]
then
         echo
"******************************************************************"
>> $DMPSTS
         echo "*** Beginning dump of root filesystem on `date
+%m/%d/%Y` at `date
  +%T` ***" >> $DMPSTS
        echo "******************************************************************
*\n" >> $DMPSTS
#Don't do the rm -rf, since we re-made the fileset freshly earlier
# /bin/rm -rf /clone_root/*
         /sbin/vdump 0f - / 2>> $DMPSTS | (cd $ROOTMNT; /sbin/vrestore -xf -)
>> $DMPSTS
         wait
         if [ $? -ne 0 ]
         then
                 echo "Failure during root filesystem dump - please investigate"
>> $ERRCHK
                 $STATUS=1
         fi
         echo "\n"
>> $DMPSTS
         echo
"******************************************************************
*" >> $DMPSTS
         echo "*** Completed dump of root filesystem on `date
+%m/%d/%Y` at `date
  +%T` ***" >> $DMPSTS
         echo
"******************************************************************
*\n" >> $DMPSTS
         /sbin/umount $ROOTMNT
fi
if [ $STATUS -eq 0 ]
then
         echo "****************************************************************"
>> $DMPSTS
         echo "*** Beginning dump of usr filesystem on `date +%D` at
`date +%T` *
**" >> $DMPSTS
         echo
"****************************************************************\n
" >> $DMPSTS
# /bin/rm -rf /clone_usr/*
         /sbin/vdump 0f - /usr 2>> $DMPSTS | (cd $USRMNT; /sbin/vrestore -xf -)
>> $DMPSTS
         wait
         if [ $? -ne 0 ]
         then
                 echo "Failure during usr filesystem dump - please investigate"
>> $ERRCHK
                 $STATUS=1
         fi
         echo "\n"
>> $DMPSTS
         echo "****************************************************************"
>> $DMPSTS
         echo "*** Completed dump of usr filesystem on `date +%D` at
`date +%T` *
**" >> $DMPSTS
         echo
"****************************************************************\n
" >> $DMPSTS
         /sbin/umount $USRMNT
fi

#-------------------------------------------#
# Generate a status message and send it off #
#-------------------------------------------#

if [ $STATUS -ne 0 ]
then
         echo "\n\n\t <<< $HST system disk cloning has failed - please
investigat
e >>>" >> $MSG
         echo "Possible reason: `cat $ERRCHK`\n\n">>$MSG
         cat $DMPSTS >> $MSG
         mailx -s "$HST's system disk cloning status: FAILED." < $MSG $NOTIFY
else
         echo "\n\n\t <<< $HST system disk cloning completed
successfully >>>\n\n
" >>$MSG
         cat $DMPSTS >> $MSG
         mailx -s "$HST's system disk cloning status: SUCCESS." < $MSG $NOTIFY
fi

exit





John Francini <mailto:francini_at_zk3.dec.com>

2A) Greg Freemyer suggests a variant of this using a
advfs cloned file set to make sure that the picture of the
disk is consistent.

>From freemyer_at_NorcrossGroup.com Tue Apr 10 16:17:10 2001
Date: Tue, 10 Apr 2001 14:58:56 -0400
From: Greg Freemyer <freemyer_at_NorcrossGroup.com>
To: Steven Timm <timm_at_fnal.gov>
Subject: re: Having a redundant ADVFS system disk

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set. ]
    [ Some characters may be displayed incorrectly. ]

Steven,

If you have the AdvFS utilities license, I would use a combination of 'clonefset' and vdump/vrestore.

The clonefset command allows you to get an instantaneous backup for short-term use. This is very useful if you have a short backup window.

Then, I believe vdump and vrestore can be piped together to backup the clone filesystem.

If so, using the example in 'man clonefset' as a basis you would do:

       # mkdir /mnt/credit_clone1
       # clonefset accounts_dmn credit_fs credit_clone1
       # mount -t advfs account_dmn#credit_clone1 /mnt/credit_clone1
       # vdump -f - /mnt/credit_clone1 | vrestore -f - -x ....
       # umount /mnt/credit_clone1
       # rmfset account_dmn credit_clone1

The '-f -' argument for vdump/vrestore should cause stdout/stdin to be used.

If you don't have or want to use clonefset, then you can just use the vdump/vrestore line above.

I would be hesitent to try the 'dd' approach, but I don't know any specific reason it would not work.

Regardless of which backup technique you use, you need to ensure your system is in a quisant state where backing it up makes sense.

For instance, if you have any applications which have checkpoint capability, they should be checkpointed, and then these applications should not write to disk until the backup is complete. (This is why clonefset can be so useful.)

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com

3) Finally, a note from Dr. Tom Blinn explaining that it
can actually be done with dd under certain stringent conditions.




>From tpb_at_doctor.zk3.dec.com Tue Apr 10 16:17:16 2001
Date: Tue, 10 Apr 2001 15:00:44 -0400
From: "Dr. Thomas.Blinn_at_Compaq.com" <tpb_at_doctor.zk3.dec.com>
To: Steven Timm <timm_at_fnal.gov>
Subject: Re: Having a redundant ADVFS system disk
> Scientific Computing Support Group--Computing Farms Operations

If you really want it to work, YOU MUST UNMOUNT ALL OF THE FILESETS IN
ALL OF THE DOMAINS.

Then you can do this, and it will work:

        disklabel -z rz3
        dd if=/dev/rrz2c of=/dev/rrz3c

and then you can re-mount the AdvFS filesets. If the rz2 disk fails,
just HALT THE SYSTEM (you won't be able to unmount the filesets if
the disk has really failed), move the rz3 disk in place of rz2 (that
is, make it be unit 2 on bus 0), and reboot. Bingo, all of your
symlinks point to the right places.

This assumes that the rz3 disk is as large as or larger than the
rz2 disk (so that the dd will not be truncated).

But, maybe you should consider using LSM and shadowing the space,
and maybe you should just do backups.

Oh, yes, this assumes that you are NOT using the swap space on
the rz3 disk, that it's just reserved to be swap. If you are
using the swap, the "dd" of the entire disk will NOT work, you
would need to dd each partition one at a time (bypassing the
swap), and you would have the problem that since the label is
in the "a" partition, you can't dd the "a" partition unless you
zero the label on the target disk first; when you copy the "a"
to the target, it will inherit the label of the rz2 disk. Not
something you'd necessarily know..

This is in the "kids, don't try this at home" category. Surely
you can afford some more disks to get reliability and redundancy.

Tom

 Dr. Thomas P. Blinn + UNIX Software Group + Compaq Computer Corporation
  110 Spit Brook Road, MS ZKO3-2/W17 Nashua, New Hampshire 03062-2698
   Technology Partnership Engineering Phone: (603) 884-0646
    Internet: tpb_at_zk3.dec.com - or - thomas.blinn_at_compaq.com
     ACM Member: tpblinn_at_acm.org PC_at_Home: tom_at_felines.mv.net
Received on Tue Apr 10 2001 - 21:32:19 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT