Followup: Slow I/O on Cluster / Raid Array 450 from Peter J. Simpson on 1998-12-11 (tru64-unix-managers)

From: Peter J. Simpson <psimpson_at_realmed.com>
Date: Thu, 10 Dec 1998 14:06:06 +0001

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks to all who have responded so far regarding slow performance of
my Alpha 1000A cluster.

Most suggestions so far center around "Writeback Cache" on the HSZ
controller. I have three follow up questions / observations for
those willing to read further....

For the record, WriteBack Cache was turned OFF before I/O seemed to
get slower, so I don't think that's necessarily the problem. In any
case, I've turned it on and it definately makes a big difference.
Writing that 16MB file went from 45 seconds to 15 seconds.

We've now observed some additional weirdness that maybe someone can
explain:

First Question
- --------------
We typically boot one node of the cluster 30 to 45 seconds before the
other so that services always land on the "preferred" node during a
boot. What we've noticed is that there is a big difference between
write speed to the array between the two nodes, but that it's not
consistant. And it does not seem to matter where the services are
located. For example:

Writing a 16 MB file to Array with Writeback Cache ON.

Boot Service Location Node 1 Speed Node 2 Speed
#1 Node 1 15.5 sec 4.7 sec
#2 Node 1 8.7 sec 9.0 sec
#3 Node 1 5.2 sec 16.2 sec
#4 Node 2 15.1 sec 5.2 sec

Again, this using KSPDA cards. Sometimes Node 1 is faster, sometimes
Node 2, and once or twice when we did it they were nearly the same.
We did NOTHING between reboots other than the "time dd if=/dev/zero
of=foobar bs=16k count=1024" command on each node. Strange. Why
would this occur? Shouldn't it always be basically the same speed on
both nodes... given that both nodes really have absolutely NO CPU and
no other I/O load on them when doing the test?

Second Question
- ---------------
Also, we're wondering about how the SCSI Bus should be connected....
the HSZ documentation shows one way, and DEC's "Golden Eggs" diagrams
show another. We've switched to the "Golden Eggs" configuration with
no difference. Is there a preferred method?

HSZ Docs:

     ------- ------- -------
     |CPU A| |CPU B| |RAID |
     | |-MemCh-| | |ARRAY|
     | | | | | 450 |
     | | | | | |
Term-|KZPDA|-SCSI--|KZPDA|-SCSI--|HSZ50|-Terminator
     ------- ------- -------

Essentially, CPU-B has both CPU-A and the Array connected to it, they
have terminators.

Golden Eggs:
                   |---------|
     ------- | ------- | -------
     |CPU A| | |RAID | | |CPU B|
     | |-MemCh-| |ARRAY| |-MemCh-| |
     | | | 450 | | |
     | | | | | |
Term-|KZPDA|---SCSI--|HSZ50|---SCSI--|KZPDA|-Terminator
     ------- ------- -------
This has CPU-A & CPU-B at the "ends" with terminators and the Array
in the middle.

I wouldn't think it really makes any difference, since it's just an
electrical connection, but maybe it matters...

Third question...
- -----------------
Since we've been looking at the HSZ documentation we note that there
are several speeds available for "negotiation": 5Mhz, 10Mhz and
20Mhz. We were running "10Mhz". Again the great documentation is
really helpful (grr) since it says there are these three settings but
the command line accepts "5Mhz", "10Mhz" and "Asynchronous". Can I
assume that "Asynchronous" is 20Mhz? That's what we've changed the
setting to with no improvement.

Thanks for your continued assitance!

Pete
-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBNnBFzj20lAOOvtjpEQLCEQCfcDa8ZbCkYSz3HJzH3/pcLM6hw38AoIcQ
Ku3n+QXtZajRmLChzWtTHkho
=0bNp
-----END PGP SIGNATURE-----
Received on Thu Dec 10 1998 - 19:06:35 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT