SUMMARY: Update: Problems with raid sets on HSZ70 Raid controller

From: Valerie Caro <valerie_at_cs.umass.edu>
Date: Fri, 18 Aug 2000 10:10:51 -0400 (EDT)

  Thanks to John J. Francini who came up with the answers, and to Ed
Meirose, Jim Kurtenbach, Peter Reynolds, Alan who also offered advice.

  Original problem: We had 5 disk drives die in an HSZ70 Raid controller
and we were having problems installing the replacement disk drives.

  The problem was that the new disk had a different geometry (total number
blocks was smaller) than the RZ1EF. This caused 2 problems.
  - I needed to tell the HSZ to gather new information on the device.
  - I can not use a slightly smaller disk to replace a disk in an existing
    Raid set. The disk needs to be the same size or larger then the
    original disks.

 The new disk sizes were a little different from the dead disks. The IBM
DGHS18Y disk says size: 35539633 blocks
The old disks were size: 35556389 blocks

  The solution to the disk size problem is to purchase the disks from
Compaq, or recreate the raid sets.

  It was also recommended that I update the firmware to V7.7.
 
The solutions from John J. Francini:

-------------------------------------------------------
From: John J. Francini <francini_at_zk3.dec.com>
Subject: Re: Update: Problems with raid sets on HSZ70 Raid controller

It almost seems as though the HSZ thinks that the DISK60100 container
is still an RZ1EF-CB instead of a completely different disk. I
believe that if you change the actual underlying type of disk in a
container, you need to delete the container from the configuration
and re-create it -- thus allowing the HSZ to gather the new
information about the devices -- before using the REPLACE command.

I bet the IBM disk has a different geometry (total number blocks in
particular) than the RZ1EF and friends. You see, different
DEC/Compaq disks of a given capacity always present the same total
number of blocks to the host. For example, the RZ29B, RZ29M, and
other 4.3 GB RZ drives always present a capacity of 8,388,608 blocks
to the host -- even if the underlying mechanism held a bit more.
This allows easy swapping in HSZs and elsewhere without having to
reconfigure.

Did the replacement drives come from Compaq or are they from a
third-party vendor? If they're from a third party, it might explain
what's happening: I bet the capacity isn't exactly the same as what
came out... If they're from Compaq, it's possible that some of them
don't have Compaq-specific firmware to paper-over the size/geometry
differences...

It's just a SWAG (Silly Wild-A* guess), but it's the first thing that
came to mind about this problem...!

Hope this helps,

John Francini

[Disclaimer: I do not speak for Compaq in any way, shape, or form]
John Francini, francini_at_zk3.dec.com
+---------------------------------------------------------------------------+

Date: Thu, 17 Aug 2000 14:43:37 -0400
From: John J. Francini <francini_at_zk3.dec.com>

According to the HSZ70 CLI reference manual:

HSZ> DELETE DISK60100
HSZ> ADD DISK DISK60100 6 1 0

(assuming that the port-target-LUN combination is 6-1-0 for that device)

then you can try the SET raidset-name REPLACE=DISK60100 command.

-------------------------------------------------------------------
--------------------------------------------------------------------

Date: Thu, 17 Aug 2000 16:14:15 -0400
From: John J. Francini <francini_at_zk3.dec.com>
To: Valerie Caro <valerie_at_cs.umass.edu>
Subject: Re: Update: Problems with raid sets on HSZ70 Raid controller

One more thing I overlooked. After doing the ADD DISK (and before
associating it with a unit) do a

        INITIALIZE DISK60100

before either adding it to a raidset or associating a unit number
with it (as a JBOD). This adds the necessary controller metadata to
the drive.

ALSO: I just realized something: RAIDsets should only contain disks
of the same capacity. A RAIDset limits the capacity of each member
to the capacity of the smallest disk in the RAIDset. In other words,
you've got a problem, since the old disks were BIGGER in capacity
than the new IBM disk. You can't add the new disk, since it is
smaller than the previously smallest volume in the RAIDset.

Basically, this means that to add this disk in you will need to:

        1. do a full back up the existing RAIDset
        2. Delete the RAIDset
        1. do a full back up the existing RAIDset
        2. Delete the RAIDset
        3. Re-create the RAIDset, including the new drive (which will
            leave a slight amount unused (16,756 blocks) on all the older
           members of the RAIDset)
        4. Restore the RAIDset's data from the backup you just made.

Not a fun situation, but there isn't much alternative for you here.

John
--------------------------------------------------------------------
--------------------------------------------------------------------

Original problem:
----------------

On Thu, 17 Aug 2000, Valerie Caro wrote:

>
> Update on problem with replacing 4 disks on an HSZ70 Raid
> controller (HSZ70, 2 controllers firmware rev V70Z-0,
> Hardware Rev H0Z).
>
> We have narrowed down our problem to the new disks which we
> are trying to put in the HSZ70 Raid array. There seems to be
> some problem with either the type of disk or the format. I am not sure what.
> I was able to move a spare disk from a different system to
> the problem HSZ70 and incorporate it into one of the raid partitions.
>
> When I try to do this with the replacement disks we bought, I get an
> error, which scrolls off the screen. I cannot seem to find an errorlog
> online anywhere. The error as I wrote it down is included below.
>
> The old drives were Dec RZ1EF-CB 18.20 gb disks
> The replacement drives (including the one that got used already) are
> IBM DGHS18Y 18.20gb disks. Are these incompatible? Do we need to format
> them somehow? Any ideas are welcome.
>
>
> The error is approximately:
>
> > set DVGRPR1 replace=DISK60100
>
> ...
> Informational Report
> Unit Number 1.(0001)
> Unit software version 1.(01) Unit Hardware version: 50.(32)
> Retry level 1, Retries 1
> Port 1, Target 1, Lun 0
> SCSI Device Type: 0.(00)
> Device ID: "RZ1EF-CB (C) DEC" Device Serial number: 0EA76BGK
> Device Firmware Rev: "0372"
> SCSI command opcode 40. (28)
> Sense Data Qualifiers 0.(00)
> SCSI sense Data
> Error code: 112.(70) { current command execution}
> ...
> Sense key 11.(0B) Aborted command
> ILI: 0 EOM: 0 FM:0
> Information: 10453701
> Additional sense length: 10.(0A)
> ...
> ASC: 0 ASCQ: 6.(06)
> FRU:0 Sense-key specific: 0
> Instance Code: 0258000A
> Error 3190: Unable to replace DISK60100 in DVGRPR1
>
>
> On Tue, 15 Aug 2000, Valerie Caro wrote:
>
> >
> > We have an HSZ70 Raid controller (HSZ70, 2 controllers firmware rev V70Z-0,
> > Hardware Rev H0Z), which had some hardware failures last week.
> > We lost the cache on one of the controllers, and 5 of the disks. After replacing
> > the cache, and restarting the controller, we replaced 1 of the disks.
> > One of the raid 3/5 sets which had been in a reduced state, started
> > rebuilding using the new disk.
> >
> > The remaining 4 disks (all on channel 6) were replaced yesterday.
> > Though there are still 2 Raid 3/5 sets in a reduced state (each lost 1
> > disk out of 6), neither is rebuilding using the new disks. I cannot figure
> > out how to force them to do this. The 4 new disks show up in the spare
> > set.
> >
> > From my Digital Storageworks HSZ70 Array Controller HSOF Version 7.0
> > manual, I tried the command:
> > SET raidset-name REPLACE=DISK60100
> > It complained that it did not know what replace meant.
> >
> > I tried setting POLICY=BEST_FIT as well as BEST_PERFORMANCE and NOPOLICY.
> >
> > The raidsets are not backed up so I would prefer not to have to destroy them,
> > and recreate them. They are each 90gb of disk space.
> >
> > I am not sure what to try now. Any suggestions would be welcome.
> >
> > Thanks,
> >
> > ---
> > Valerie Caro Computer Science Computing Facility,
> > valerie_at_cs.umass.edu University of Massachusetts
> > Amherst, MA 01003
> >
> >
> >
> >
>
> ---
> Valerie Caro Computer Science Computing Facility,
> valerie_at_cs.umass.edu CmpSci Room 126
> University of Massachusetts
> Amherst, MA 01003
>
>
>
>
>

---
Valerie Caro		Computer Science Computing Facility,
valerie_at_cs.umass.edu	CmpSci Room 126
  	                University of Massachusetts
                 	Amherst, MA           01003
Received on Fri Aug 18 2000 - 14:13:54 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT