All,
I posted last week regarding disk storage rebuild times as I need to replace some 4.3 gb drives with 18gb drives, which are managed by hsz70 controllers. These in turn are part of LSM and a disk-based cluster service.
The original post follows, as well as some very helpful responses based on actual testing from alan_at_nabeth.cxo.dec.com. The bottom line is that there apparently are not any set formulas or "rules-of-thumb" for calculating storage construction and data restoration times without actually doing it. Also, there is a tool called VDTPY from the HSZ70 console to view actual read and write data rates from the controller.
Thanks again, Alan.
Scott Mutchler
Gordon Food Service
==========
Original post:
All,
I need to replace several 4.3GB drives in my esa10000 cabinet. I intend to replace them with 18gb drives. What I am wondering is if anyone knows how much time I should allow for building the storage sets.
Part 1 - hsz
At present, I have a RAID 5 storage set behind a dual-redundant HSZ70 (I have firmware v7.3 on order to support the 18gb disks). It is a ten disk set (long story). The disks in question are all 4.3 gb, and what I intend to do is replace them with 18GB disks. I also want to divide the 10 disk set into two 5 disk sets. Does anyone have a handy "rule of thumb" (or proven formula) for predicting how long it will take the hsz70 to construct these 5-disk raid5 sets?
part 2 - LSM
The storage units on my hsz's are constructed identically on another bus (and pair of dual-redundant hsz's). This is in turn mirrored via LSM. The LSM volumes in turn have AdvFS filesets on them that are part of a shared disk service between two 4100 nodes. The 4100's are 1gb RAM, two processor (533MHZ). The hsz's connect to kzpsa's at the end of each bus in each host; hosts run Tru64 v4.0d with pk#2 and TCR 1.5 with pk#2.
I expect I will have to destroy the existing LSM volumes and redefine the subdisks, plexes, and volume with new sizes. So, are there any rules of thumb again for how long it will take LSM to a) start the volume initially on the 5x18 gb storage set from part 1?; and b) how long to sync the two halves of the mirror?
Any insights, formula's, rules of thumb, etc., most welcome!
=====
Alan's first reply:
I can think of too many "build times" involved to give you
a useful answer. The HSZ INITIALIZE command of a RAID-5
will take less than a minute. It does this so quickly by
initalizing the meta-data to indicate that none of the
parity is consistent. Once a logical unit is created it
starts to generate the correct parity. Writes to areas
that haven't been regenerated will do so. Reads may
get "forced errors".
If the members are on separate busses, I think the HSZ
can run the disks close to media speed while it makes
the array consistent. If you have two disks on a bus
or a bus is otherwise busy, then the rebuild will have
to share according to the RECONSTRUCT option. I don't
know whether the algorithm for making the parity consistent
writes a consistent value on all members or reads the others
to generate the consistent parity.
Since I have an HSZ70 with some disks in a RAID that I'm
not using, I'll reinitialize a unit and see what it does
once the unit is added... The disks are 9 GB 7200 rpm
disks each on a separate bus. The controller is otherwise
idle. According the VTDPY device display, I'm getting
around 2 MB/sec reading from each member with 512 KB
writing (which answers the question about reading vs.
writing). I'm pretty sure the disks are capable of more
so it is running slower than media speed.
In a separate test, I used hszterm to show the state of the
RAID, slept for 300 seconds and showed the stated again.
It had complete ~6% in the 300 seconds. This is pretty
consistent with the data rate. So, I can expect my RAID
of 9 GB disks to complete in an hour to an hour and a half.
However, I can still use the array while it is building
though the redundancy is suspect until the controller is
done.
As for how long it will take LSM to copy the data, I don't
know. If it uses a respectable transfer size, it will only
be limited by the CPU time required and the write speed to
the other RAID-5. Using the writeback on the other unit
will allow the controller to do a fair amount of write gathering
to get a RAID-3 like write performance instead of the class
(slow) RAID-5.
If you also have to backup/restore the data, the restore time
will probably be the biggest factor. Even relatively fast
backup software is designed for fast backup without much
attention to restore times. For large files, restoring will
typically be limited by how fast you can read from tape and
writing the bare data. For small files, the time is dominated
by the need to create the files and update modification times.
If you're using AdvFS for the file system, you may to try to
keep the old arrays around while you make the new ones. Once
you have the LSM mirrored plex you can use addvol to add the
space to the domain and then remove the old one. AdvFS will
copy the data, while keeping the filesets online.
=========
I sent these follow-up questions to Alan.
Do I understand from your analysis that once the unit is created and begins constructing parity that it is actually usable at that time (minus the redunancy)? And if this is possible, perhaps it is not advisable since I will be putting in 5 brand new disks (and yes, each of the 5 disks will be on its own bus in the esa10000). That is, I would hate to lose time and effort if, for example, while restoring data from tape, one of my new disks failed before parity was completely constructed on the set. Your thoughts?
How many disks did you assign to the raid5 set you built? I see they were 9gb disks. Should I infer, for instance, that if you have a five disk set of 9gb disks and you are getting about 6% constructed per 5 minutes, then I would see 3% in five minutes on disks twice the size (five disk raid5 set with 18gb disks)? One difference, though, is my old 4.3gb disks are 7200 rpm and the new ones are supposed to be 10000rpm.
I am not familiar with the VTDPY display you referenced. If the write throughput to the raid5 set is 512kb/sec, can I expect to restore data back onto the storage set at about 30mb/min (or 1800mb/hour)? Wow. I have to put 34gb back into place, which looks like 19.3 hours. ouch. Am I figuring that correctly? Oh, and I plan to use vrestore to put the data back from a TZ885, which is connected to a separate scsi bus.
=========
To which Alan replied:
re: unit usable during initial reconstruct.
It has been my habit to wait until the array is reconstructed
when I build RAID-5 and RAID-1 before using it. However, that
generally because I'm throwing together a unit to do an ad-hoc
performance test and the rebuild just gets in the way of
seeing what the performance is really like. There are three
I/O options possible on an array that hasn't completed
initialization:
o Writing new data. It should generate the parity when
it writes the data making it redundant.
o Reading previously written data. See first bullet.
o Reading unwritten data. If it hasn't been written,
you can't reasonably care whether it is "correct"
or not.
It has been so long since I looked closely at the RAID
algorithms I don't recall what the risk is in using a
RAID while reconstructing. I think the risk is that
if you lose a member, you can't write protected data
in some cases. Reads of data written before the lost
member are protected, but later writes might not be.
In the end, what really matters is how much performance
you lose if you try to write while competing with the
subsystem. It may seem to take forever to restore a
backup if you overlap the initialization and your own
work. But, in the end, doing them sequentially is
likely to take longer.
re: My RAID-5
5 disks.
re: Your rebuild time with 18 GB disks. If the media
and rotational speed of the 18 GB disks is the same as
my 9 GB disks, then your build will probably take twice
as long. If your disks are faster than mine, the media
data rate may be higher and therefore yours faster. If
the rebuild is limited by media data rate, then faster
media means a better rebuild time. If the rate is
limited by the controller, then more data will take
appropriate longer.
Remember my 512 KB/sec was the per-disk rate doing just
the rebuild. You might get better doing host I/O. On
the other hand, backup programs are generally written
with saving, not restoring performance in mind.
Received on Wed Dec 29 1999 - 16:32:09 NZDT