Slow I/O on Cluster / Raid Array 450

From: Peter J. Simpson <psimpson_at_realmed.com>
Date: Fri, 04 Dec 1998 16:53:19 +0001

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Good afternoon,

This is my first post to the list, so please pardon any "newbie" faux
pas I may commit.

I have the following DEC Alpha Cluster configuration that seems to be
experiencing I/O problems with the Raid Array. At least, that's
where I think the problem is - I can't seem to find any solid
evidence pointing anywhere in terms of error logs or alerts. Maybe
someone out there can help!

The systems were running normally (I say that based only upon
"experience", not emperical evidence) until Monday of this week.
Starting Monday, writes to the array are a factor of 10 slower. I
can't find any errors and the cabinet is happily humming along.

A matched pair of CPU's:
- ------------------------
Digital UNIX V4.0D (Rev. 878); Mon Mar 23 16:40:56 EST 1998
Digital UNIX TruCluster V1.5 (Rev. 270); 12/30/97 20:36
Digital UNIX V4.0D Worksystem Software (Rev. 875)
System Type: DEC1000A_5
Number of CPUs: 1 Type: EV56 Speed: 333 Mhz Cache: 2.0 MB Memory
size: 1024 MB

Raid Array 450 w/HSZ-50 (32MB Cache) filled with RZ28D-VW 2.1GB disks
KZPDA-AA "FWSE SCSI Card" in each system
Memory Channel Interface between CPU's
Systems are used for Oracle 7.3.4 Parallel Server, database on RAW
devices

Picture:
- ------- ------- -------
|CPU A| |CPU B| |RAID |
| |-MemCh-| | |ARRAY|
| | | | | 450 |
| | | | | |
|KZPDA|-SCSI--|KZPDA|-SCSI--|HSZ50|
- ------- ------- -------

KZPDA's are connected to each other, then to the Array (differential
SCSI).
Array has 4 RAID5 Raidsets defined, using disks as:

HSZ> show raidset
Name Storageset Uses Used by
- ----------------------------------------------------------------------
- --------
 
RAID1 raidset DISK410 D3
                                             DISK510
                                             DISK610
 
RAID2 raidset DISK110 D5
                                             DISK210
                                             DISK310
 
RAID3 raidset DISK100 D6
                                             DISK200
                                             DISK300
 
RAID4 raidset DISK420 D4
                                             DISK520
                                             DISK620

Question:

When I compare write time for a 16mb file to the local system disk in
the CPU cabinet to a filesystem on the array, I would expect it to be
a little slower due to the shared SCSI bus. But look at these times:

System Disk:
- ------------
#time dd if=/dev/zero of=foobar bs=16k count=1024
 
1024+0 records in
1024+0 records out
 
real 0.8
user 0.0
sys 0.7

Raid Aray:
- ----------
#time dd if=/dev/zero of=foobar bs=16k count=1024
 
1024+0 records in
1024+0 records out
 
real 46.4
user 0.1
sys 0.6

.8 seconds compared to 46.4?! That can't be correct.

Can someone with a similar configuration run this and see if it's
"normal"?

Anyone have any ideas? There have been no O/S related changes.
Minor database changes have been reversed. In fact, we've recovered
the system back 1 week (before performance degradation) with minimal
improvement.

Thanks.

Pete

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBNmiD/j20lAOOvtjpEQI/mACeKwdTFuqu0Rxo23sm7SR5LwFNWbgAnAp2
0Bb+wf+m0kDxYWE5fUbPRnOF
=Gi7C
-----END PGP SIGNATURE-----
Received on Fri Dec 04 1998 - 21:53:57 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT