Summary: disk storage rebuild times

From: Scott Mutchler <smutchler_at_gfs.com>
Date: Wed, 29 Dec 1999 11:29:59 -0500

All,

I posted last week regarding disk storage rebuild times as I need to replace some 4.3 gb drives with 18gb drives, which are managed by hsz70 controllers. These in turn are part of LSM and a disk-based cluster service.

The original post follows, as well as some very helpful responses based on actual testing from alan_at_nabeth.cxo.dec.com. The bottom line is that there apparently are not any set formulas or "rules-of-thumb" for calculating storage construction and data restoration times without actually doing it. Also, there is a tool called VDTPY from the HSZ70 console to view actual read and write data rates from the controller.

Thanks again, Alan.

Scott Mutchler
Gordon Food Service

==========
Original post:

All,

I need to replace several 4.3GB drives in my esa10000 cabinet. I intend to replace them with 18gb drives. What I am wondering is if anyone knows how much time I should allow for building the storage sets.

Part 1 - hsz

At present, I have a RAID 5 storage set behind a dual-redundant HSZ70 (I have firmware v7.3 on order to support the 18gb disks). It is a ten disk set (long story). The disks in question are all 4.3 gb, and what I intend to do is replace them with 18GB disks. I also want to divide the 10 disk set into two 5 disk sets. Does anyone have a handy "rule of thumb" (or proven formula) for predicting how long it will take the hsz70 to construct these 5-disk raid5 sets?

part 2 - LSM

The storage units on my hsz's are constructed identically on another bus (and pair of dual-redundant hsz's). This is in turn mirrored via LSM. The LSM volumes in turn have AdvFS filesets on them that are part of a shared disk service between two 4100 nodes. The 4100's are 1gb RAM, two processor (533MHZ). The hsz's connect to kzpsa's at the end of each bus in each host; hosts run Tru64 v4.0d with pk#2 and TCR 1.5 with pk#2.

I expect I will have to destroy the existing LSM volumes and redefine the subdisks, plexes, and volume with new sizes. So, are there any rules of thumb again for how long it will take LSM to a) start the volume initially on the 5x18 gb storage set from part 1?; and b) how long to sync the two halves of the mirror?

Any insights, formula's, rules of thumb, etc., most welcome!

=====
Alan's first reply:


        I can think of too many "build times" involved to give you
        a useful answer. The HSZ INITIALIZE command of a RAID-5
        will take less than a minute. It does this so quickly by
        initalizing the meta-data to indicate that none of the
        parity is consistent. Once a logical unit is created it
        starts to generate the correct parity. Writes to areas
        that haven't been regenerated will do so. Reads may
        get "forced errors".

        If the members are on separate busses, I think the HSZ
        can run the disks close to media speed while it makes
        the array consistent. If you have two disks on a bus
        or a bus is otherwise busy, then the rebuild will have
        to share according to the RECONSTRUCT option. I don't
        know whether the algorithm for making the parity consistent
        writes a consistent value on all members or reads the others
        to generate the consistent parity.

        Since I have an HSZ70 with some disks in a RAID that I'm
        not using, I'll reinitialize a unit and see what it does
        once the unit is added... The disks are 9 GB 7200 rpm
        disks each on a separate bus. The controller is otherwise
        idle. According the VTDPY device display, I'm getting
        around 2 MB/sec reading from each member with 512 KB
        writing (which answers the question about reading vs.
        writing). I'm pretty sure the disks are capable of more
        so it is running slower than media speed.

        In a separate test, I used hszterm to show the state of the
        RAID, slept for 300 seconds and showed the stated again.
        It had complete ~6% in the 300 seconds. This is pretty
        consistent with the data rate. So, I can expect my RAID
        of 9 GB disks to complete in an hour to an hour and a half.
        However, I can still use the array while it is building
        though the redundancy is suspect until the controller is
        done.

        As for how long it will take LSM to copy the data, I don't
        know. If it uses a respectable transfer size, it will only
        be limited by the CPU time required and the write speed to
        the other RAID-5. Using the writeback on the other unit
        will allow the controller to do a fair amount of write gathering
        to get a RAID-3 like write performance instead of the class
        (slow) RAID-5.

        If you also have to backup/restore the data, the restore time
        will probably be the biggest factor. Even relatively fast
        backup software is designed for fast backup without much
        attention to restore times. For large files, restoring will
        typically be limited by how fast you can read from tape and
        writing the bare data. For small files, the time is dominated
        by the need to create the files and update modification times.

        If you're using AdvFS for the file system, you may to try to
        keep the old arrays around while you make the new ones. Once
        you have the LSM mirrored plex you can use addvol to add the
        space to the domain and then remove the old one. AdvFS will
        copy the data, while keeping the filesets online.

=========
I sent these follow-up questions to Alan.


Do I understand from your analysis that once the unit is created and begins constructing parity that it is actually usable at that time (minus the redunancy)? And if this is possible, perhaps it is not advisable since I will be putting in 5 brand new disks (and yes, each of the 5 disks will be on its own bus in the esa10000). That is, I would hate to lose time and effort if, for example, while restoring data from tape, one of my new disks failed before parity was completely constructed on the set. Your thoughts?

How many disks did you assign to the raid5 set you built? I see they were 9gb disks. Should I infer, for instance, that if you have a five disk set of 9gb disks and you are getting about 6% constructed per 5 minutes, then I would see 3% in five minutes on disks twice the size (five disk raid5 set with 18gb disks)? One difference, though, is my old 4.3gb disks are 7200 rpm and the new ones are supposed to be 10000rpm.

I am not familiar with the VTDPY display you referenced. If the write throughput to the raid5 set is 512kb/sec, can I expect to restore data back onto the storage set at about 30mb/min (or 1800mb/hour)? Wow. I have to put 34gb back into place, which looks like 19.3 hours. ouch. Am I figuring that correctly? Oh, and I plan to use vrestore to put the data back from a TZ885, which is connected to a separate scsi bus.

=========
To which Alan replied:

        re: unit usable during initial reconstruct.

        It has been my habit to wait until the array is reconstructed
        when I build RAID-5 and RAID-1 before using it. However, that
        generally because I'm throwing together a unit to do an ad-hoc
        performance test and the rebuild just gets in the way of
        seeing what the performance is really like. There are three
        I/O options possible on an array that hasn't completed
        initialization:

        o Writing new data. It should generate the parity when
           it writes the data making it redundant.

        o Reading previously written data. See first bullet.

        o Reading unwritten data. If it hasn't been written,
           you can't reasonably care whether it is "correct"
           or not.

        It has been so long since I looked closely at the RAID
        algorithms I don't recall what the risk is in using a
        RAID while reconstructing. I think the risk is that
        if you lose a member, you can't write protected data
        in some cases. Reads of data written before the lost
        member are protected, but later writes might not be.

        In the end, what really matters is how much performance
        you lose if you try to write while competing with the
        subsystem. It may seem to take forever to restore a
        backup if you overlap the initialization and your own
        work. But, in the end, doing them sequentially is
        likely to take longer.

        re: My RAID-5

        5 disks.

        re: Your rebuild time with 18 GB disks. If the media
        and rotational speed of the 18 GB disks is the same as
        my 9 GB disks, then your build will probably take twice
        as long. If your disks are faster than mine, the media
        data rate may be higher and therefore yours faster. If
        the rebuild is limited by media data rate, then faster
        media means a better rebuild time. If the rate is
        limited by the controller, then more data will take
        appropriate longer.

        Remember my 512 KB/sec was the per-disk rate doing just
        the rebuild. You might get better doing host I/O. On
        the other hand, backup programs are generally written
        with saving, not restoring performance in mind.
Received on Wed Dec 29 1999 - 16:32:09 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:40 NZDT