Recovering RAIDSETs on a SW410 after HSZ40 died/replaced

From: David Hume <dhume_at_gds.CA>
Date: Tue, 15 Sep 1998 17:22:30 -0700 (PDT)

Hi all. This is a question aimed at DU managers who have
also been through the wars with Storageworks cabinets
and RAID sets; apologies if there is a more appropriate
DEC hardware list to aim this missive at, pointers to same
also much appreciated.

The box is question is an old SW410, which had an HSZ40
controller card. The 4 rows were populated with 24 disks,
consisting of 3 RAIDSETs of 6 disks each (RAID5, so Reduced
would be 5), 2 STRIPESETs, one with 3 disks one with 2, and
the remaining disk was a SPARESET.

On the weekend, the HSZ40 controller went south. SRAM battery
gone, end of card. We did not have a SAVE_CONFIGURATION. Add
to this the fact that I discovered this Raidset had been in
such usage for so long (over 4 years now) we don't have a map,
and the final complication - while there are backups, not as
recent as I would like, so my goal becomes clearly to reacquaint
myself with the existing data.

Digital Hardware support has replaced the card (twice - first one
had memory error, had to wait for a second one to be flown in)
with another HSZ40, the firmware card is the same, and we
(because that was how the old card was) set the SCSI targets from
1 to 4:


Controller:
        HSZ40 (snip) Firmware V27Z-0, Hardware B01
        Not configured for dual-redundancy
        SCSI address 7
        Time: NOT SET
Host port:
        SCSI target(s) (1, 2, 3, 4), No preferred targets
Cache:
        32 megabyte write cache, version 2
        Cache is GOOD
        Battery is GOOD
        Unflushed data in cache
        CACHE_FLUSH_TIMER = 65535 (seconds)
        CACHE_POLICY = A
        Host Functionality Mode = A


Once at this point, were able to recover the two stripesets
quite easily (deduced via the UNIX /dev/ name the Unit name,
and did know which disks, all in the top row DISK130 thru
DISK630, were in the two stripesets) by ADD STRIPESET ...,
skipping INITIALIZE of course, and going onto ADD UNIT <stripeset>
it worked, mounted right where used to be, happiness.

Now, more crucially, trying to get the 3 raidsets back. This
is proving more problematic and of course also do not want to
INITIALIZE them. Proceeded similar as follows...

ADD RAIDSET R1 DISKa DISKb ... DISKf (were 6 disks in full RAID5)

but then upon

ADD UNIT D100 R1 (for e.g.) get the message...


Error 9310: Container metadata check failed, unit not created.
            Unknown metadata status = 000000FE (hex)

Now, recognizing we had a best-memory of the old Raid combinations
as opposed to an actual map, I have been led to believe
I will simply get this message for every combination I try
until I get a "correct" one, the right six disks. However,
I have exhausted all the logical combinations of the disks
on these 3 shelves (the row across of 6 disks, two columns of
all 3 rows; and have tried more than one of each of these in
case a disk that had been in one of these combinations had gone
into the FAILEDSET) but still get the same message.

One curiousity; did try to make a Raidset or two, rather
than via the hszterm command line, via CFMENU. In this
case, it does ask the interesting question
"Is this a previously configured, REDUCED raidset?[y/n]"
(well, Yes to the first, but when last used was not in
a reduced state, so No), and of course, since don't want
to INITIALIZE in CFMENU any more than on the command line,
sees nothing eligible to make a UNIT out of.

Also, at this time I have *not* added the one disk back into
the SPARESET.

So, while pursuing Digital support, thought I would ask for
any Suggestions or Similar Experiences from the list.
Please e-mail me at dhume_at_gds.ca , and I will
summarize.

Much Thanks in Advance,

        Dave Hume, GDS & Associates, Victoria BC
Received on Wed Sep 16 1998 - 00:23:40 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT