Hello DU managers,
I got my answer in under 12 hours plus a lot of info on RAIDs.
The gist of my problems was that the disks were in a state that the raid
configuration util didn't like so the utility would just show me that the
disks were in Fail status and it would say installation abort leaving me
no options other than hard reboot. (I have included the original posting
at the end of this posting). The responders were sympathetic to my
problems as they all have had to struggle with RAID.
Thanks to the following who took the time to help me out:
Tom Webster <webster_at_ssdpdc.lgb.cal.boeing.com>
Susan Rodriguez <SUSROD_at_HBSI.COM>
Baisley_at_fnal.gov
alan_at_nabeth.cxo.dec.com
Kevin Reardon <kreardon_at_na.astro.it>
Don Weyel <weyel_at_bme.unc.edu>
Patrick Farley <farley_at_Manassas1.TDS-GN.LMCO.COM>
> 1). How do I back out from this problem?
>
Kevin Reardon had to most direct answer to the problem:
"The RAID controller can be rather picky sometimes when the disks aren't
in the configuration it expects. As you saw, the swxcrmgr program won't
even run if it doesn't find the right configuration (pretty dumb, since
often the reason one needs to run the program is to edit a problemmatic
configuration)! To start the swxcrmgr program from the ARC console without
checking the configuration, issue the command "A:SWXCRMGR -o". This will
override the stored configuration and allow you to access the program.
So, the steps I would guess you need to take are:
1) run swxcrmgr program with override option
2) delete the entries in the configuration for the two new disks
3) put the two disks in slots controlled by the RAID controller
4) power cycle the computer
5) start the swxcrmgr program (should start without need for override
option)
6) select new configuration
7) the two new disks should show up as "RDY"
8) create and arrange two groups
9) create logical drives for each of the two groups"
I had decided to back out completely from using the RAID due to
information that I received from others in the group. I will discuss this
further down in this posting.
The steps I used were add the two disks back to the RAID scsi and call up
the RAID config. util.
1) View/Update Conf.
2) Define Spare
<enter> make it think I made a change
<enter> actually toggle back
<esc> exit this section
Save config? this is important, overwrites the info.
Y writing empty config. info.
When I got back to >>>, the disks no longer showed under SHOW DEVICE and
it did it quickly, without the dozen messages that it was polling the scsi
interface and it stop taking 2-3 minutes to get pass:
Initializing xcr0 ....
I'm a happy camper!
> 2). Should this have worked at all? If yes, where did I go wrong?
The answer is a qualified "yes".
According to Tom Webster:
Normally it wouldn't be too bad of an idea. The problem is that the KZPS*
RAID controlers have a firmware limitation of something like 32GB per bus.
This made some sense back in the "4GB is as big as disks can ever get"
days, but started causing problems when DEC started shipping 9GB SSB
disks.
The fact that you have 46GB on the bus is what the SWXCR driver is griping
about.
With smaller disks yes, the 23GB disks aren't on the supported disks
list for a reason. Unfortunately, DEC tensd to be a little slow in
updating the compatability lists, so it is hard to tell.
In addition, most people never used the Automatic configuration.
Patrick Farley experiences were:
"... our DEC support guy said to NOT use the Automatic configuration.
I use JBOD and it works fine, "
>
> 3). Perhaps I should buy second scsi controller and forget using the
> raid controller in JBOD mode?
Tom Webster said: "Yes for the 23GB disks. It's also seems a little bit
of a waste to use the RAID controller in JBOD mode...."
alan_at_nabeth.cxo.dec.com says:
Many RAID controllers are very sensitive to the SCSI
drives used. Since Digital doesn't sell any 23 GB
drives, it is unlikely that such drive had ever been
tested with the particular controller. You should have
been surprised if it had worked. For these drives you're
probably better off getting a direct connect SCSI adapter
and using the KZPAC for supported devices that you need
in a RAID.
Well I got this system with the raid controller in it, but no StorageWork
chassis. The cable for it went from that ultra thin adapter to wide
scsi, so I figured why not. Well now I know more about RAID than I need
to know and enough to know that for my purposes I will take the
recommendation, "BUY A SECOND SCSI CONTROLLER."
----
There were four reponses that went into the discussion of other work
arounds and experiences that if you manage a RAID I think it may give you
perhaps a better feel of RAIDs. This posting is already too long.
Email if you want them.
Thanks again everyone!
Diane Ibaraki
University of Hawaii
High Energy Physics Group
<diane_at_uhheph.phys.hawaii.edu>
------------------ original posting ----------------------------
> Hi DEC Unix Managers,
> I have a DEC 500/333 system with a Raid SCSI controller
> (KZPSC-PA). It shows up under the PCI Bus:
>
> Bus 00 Slot 07: DAC960 Scsi Raid Controller
>
> Running DEC Unix 4.0B
> Please note that this is the first time that I ever dealt with RAID.
>
> Well I had the bright idea of adding 2 23GB disks to the raid
> controller and set them up in JBOD mode. I added the two disks (wide
> scsi) to the raid controller and set the scsi id at 0 and 1.
>
> >>>ARC
> power cycle system
> load diskette
> selected Run a program
> a:swxcrmgr
> Selected Automatic Configuration
> Selected JBOD
> Selected Initialize Logical drive
> Selected Logical Drive 0
> Selected Logical Drive 1
> Selected Start
> (Then I went home... it took a long time)
>
> Well since I didn't know what these disks would be called, I decided
> to
> boot the system with genvmunix to see how the system would configure
> these
> disks. May have been a mistake, on boot it gave:
> xcr0 at pci0 slot 7
> xcr_logger: XCR_ERROR packet
> xcr_logger: cntrl 0 unit 0
> re_getdrive
> Cmd should always return good status
> Hard Error Detected
> Active XCR_COM at time of error
> xcr_logger: XCR_ERROR packet
> xcr_logger: cntrl 0 unit 0
> re_getdrive
> Cmd memory lost
> Possible Software Problem - Impossible Cond Detected
>
> and it repeated for cntrl 1 unit 1 also.
>
> I then made things worst by assuming the these new disks may have been
> DOA and switched them to the regular scsi bus and did the disklabel and
> newfs to verify them. They were okay.
> Now I can't get swxcrmgr to work at all. It gives an installation
> aborted message and show original state as optimal and current state
> as failed.
>
> 1). How do I back out from this problem?
>
> 2). Should this have worked at all? If yes, where did I go wrong?
>
> 3). Perhaps I should buy second scsi controller and forget using the
> raid controller in JBOD mode?
>
Received on Thu Jan 29 1998 - 22:30:08 NZDT