Shared SCSI Bus (AlphaServer 800, KZPSA, HSZ50)

From: Edouard Poor <edouard_at_cs.auckland.ac.nz>
Date: Thu, 05 Nov 1998 15:18:05 +1300

Hello all.

I've got a question about two Alpha sharing one SCSI bus to a RAID controller.

The hardware involved is:

  Two DEC AlphaServer 800 5/400 ('data', 'mail')
    both with KZPSA PCI to SCSI adpaters each with internal
    terminators removed, and set up with different SCSI ids (6, 7)

  Full set of appropriate cables, external terminators, trilink connectors,
    etc, etc, all set up as per the installation guides.

  One RAID cabinet (BA350 I think, but we got it sans case becase someone
    thought we were going to rackmount it ourselves)

  One HSZ50 RAID controller for the cabinet.

  Ten 4.3G discs arranged as one stripe of 6 on one shelf, one stripe of
  3 on the second shelf and 'spare set' disk on a third shelf.


Before I started working here the system had been set up with both machines
mounting (different) RAID stripes without installing any of the TruCluster
software. As has been said before in this list, you can't do this, and
indeed I found this out the hard way (reboot, kernel panic during advfs
fsck, reboot, repeat).

I fixed that problem by going to a backup and installing some disks
internally in the second machine, so I didn't have both machine mounting
over the same shared SCSI bus. I did however leave both machines connected
to the shared SCSI bus.

In the short term (a week or so) I need to get both machines sharing the
SCSI bus again and mounting their own RAID stripes. As these are mission
critical running systems, I can't afford to take them down for very long,
so what I want in the short term is *only* the ability to share the SCSI
bus. Over the christmas break I should be able to take the machines down
for a couple of days in order to do the full TruCluster Available Server
setup with automatic fail-over of our databases and mail systems.

What I've done so far is install the TruCluster Available server products
(TCRCOMMON150, TCRASE150, TCRCMS150, TCRMAN150, TCRCONF150), but not done
anything more to set up the ASE. I did recompile the kernel as part of that
installation to (I assume) installed the updated drivers for the KZPSA
controllers. The installation has itself set up various running daemons,
and issueing the command "asemgr -d" returns both machines as being part of
the cluster:

26 data# asemgr -d
        Level of ASE logging:
Notice, warning, and error logging
        Location of Logger(s)
No loggers found
        Member Status
Member: Host Status: Agent Status:
mail UP RUNNING
data UP RUNNING



My Question is:

  Having installed the new kernel, can I now, as a temporary measure until
  I get a chance to bring both machines down for an extended period, start
  sharing the SCSI bus again -- does the new KZPSA drivers now correctly
  handle the bus being used by both machines? My immediate aim is to get back
  to using the RAID array so that our data is at least safe from a single
  disk failure.

  If not, what is the *minimum* that needs to be set up in ASE to achieve
  this? (and I'll do the rest of the ASE setup over christmas).

Cheers,

Edouard Poor,
UNIX Consultant,
Computer Science Department,
The University of Auckland.
Received on Wed Nov 04 1998 - 02:20:06 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT