Summary: Barracuda 9 and Fujitsu 9Gb drives crashing 3.2c system

From: Nick Hill - RAL DCI Systems Group <NMH1_at_axprl1.rl.ac.uk>
Date: Wed, 11 Jun 1997 16:40:38 +0100 (BST)

First an apology for being somewhat late with a summary.

The following people supplied useful information about using 9Gb drives and
general information about KZPSA FWD SCSI controllers.

"C.J. Bol" <bol_at_GisSrv.IenD.WAU.NL>
Paul Kitwin <PAUKIT_at_HBSI.COM>
Kurt Carlson <sxkac_at_java.sois.alaska.edu>
(Jeff Payne, SOHO CDS, CLRC, +44-1235-446404) <payne_at_solg2.bnsc.rl.ac.uk>

While their info did not directly solve the problem it made me believe that
the 9Gb Barracuda and Fujitsu disks should work OK and I began to suspect a
UNIX 3.2c problem. Our local DEC field engineer managed to get me a lone of a
KZPSA which I put in an Alphastation 255 hooked up via a DWZZB to a shelf of
Fujitsu and Barracuda 9 drives. I tried this setup under UNIX 3.2d and 4.0b
and could not get them to fail. I also got some impressive performance
figures out of the testing. With this extra info I managed to get a rev P
KZPSA in my 8400 and did a quick bare bones 4.0b install on the 8400 and
repeated my tests. As far as I could tell everything worked fine so it would
appear to be an Advfs under UNIX 3.2c problem. I also managed to sustain a
write rate of around 10Mb/s to a single disk which isn't bad going.

I include the individual responses below for general reading.....

**************************************************************************
"C.J. Bol" <bol_at_GisSrv.IenD.WAU.NL>

Nick, I use the Seagates (7xST19171W) with an AS 600 Digital Unix V4.0 +
KSPSA + DWZZB (signal converter) + ADVFS and all works without any errors
and very fast.
Although it's another machinetype I think I let you know.

Gr. Kees Bol
Agricultural University Wageningen
The Netherlands

PS: There is a diskexercizer from DEC, /usr/field/diskex. You can test
drives in various ways.

***************************************************************************
From: Paul Kitwin <PAUKIT_at_HBSI.COM>

I had a similar problem. Except it didn't matter what type of drives I
used. Whenever there was heavy I/O the system crashed.

Do you have any unused KZPSA's in the PCI Cage? The 8400 (as well as
8200) has a problem with unused cards. If not, it could be a bad
backplane in the PCI cage. This is what happened to me. The system
would simply crash during heavy IO between drives (HDD or Tape). The DEC
FE came out, I gave him a crash dump and he had it analyzed. It turns
out that the backplane was seeing "ghost" KZPSA's.

The FE had to replace the backplane. You may want to have a crash dump
ready for your FE when he/she gets there.

***************************************************************************

From: Kurt Carlson <sxkac_at_java.sois.alaska.edu>

Ultimately this will take support from Digital, but I'll
make some stabs in where to look and what to ask.

>Machine: Alphaserver 8400
>O/S: Digital UNIX 3.2c with various patches

Is this the full patch kit for 3.2c?
This may be pertinent for 9gb disks as they are relatively new.
Support for the 9gb disks behind HSZ40's requires a firmware
(HSOF) upgrade to v3.1, the same may be comparable to native support
behind kzpsa's, but I haven't heard about that... it likely
would not be kzpsa firmware but a UNIX patch for that.
I've not seen patches relevant to this in the v4.0b or v3.2g
kits, I haven't read the v3.2c kit description.

Is the firmware on the disks the Digital supplied firmware
and is it the current version?
You can find the version with:
  scu show device bus 2 target 4 lun 0
This is probably the most relevant question for this problem.

There is a blitz for possible kzpsa corruption (a patch to
simport.o), but the manifestations of the crash are different
from what you report (although we saw it with different
devices behind kzpsa's so symptoms could be different).

There is a problem with older releases of kzpsa, I believe
anything older N01. Be careful, kzpsa's may be labeled as
P01 but if they are etched F01 they still report to the
operating system as F01 and may still be subject to the problem.
I did not hear the details of this problem, I just heard it
existed during research resulting in the simport.o patch
mentioned above.

>KZPSA adapter misc error
>pzaintr: KZPSA adapter misc error, ars=0x10, afar=0x0. afpr=0x617

It's because of this I mentioned the kzpsa blitz and
the older version problems.

>simple lock: time limit exceeded.

We saw panics of this nature under v3.2c and v3.2d-1 resulting from
soft tape errors coming from tz87's behind kzpsa's. The
fix for that was patches to cam_tape.o which is certainly
irrelevant to your problem, but the same SMP exposure could
exist for errors generated by different device types.

>Should these disks work OK on my system. The Fujitsu disk is after all the
>new DEC rz40 9Gb drive in a slightly different package. I need to resolve
>this issue as I shortly need to buy lots more 9Gb drives to add to the system
>and would like to know that they will work before spending the money!

Please summarize to the list or personally... we have 24
9gb drives on order (mix of rz40 and rz1db 16 bit disks), some of
which will go behind our hsz40's and some direct off kzpsa's for
an 8400 as you describe. Kurt

_____________________________________________________________________
Kurt Carlson, University of Alaska SOIS/TS, (907)474-6266
sxkac_at_alaska.edu 910 Yukon Drive #105.63, Fairbanks, AK 99775-6200

****************************************************************************
From: (Jeff Payne, SOHO CDS, CLRC, +44-1235-446404) <payne_at_solg2.bnsc.rl.ac.uk>

Nick,
        Just to let you know that I have recently purchased four
DEC 9Gb OEM Fujitsu disks part number SHWGA-AA Fast Wide SCSI
(Not the same PN as you )
These are in a Raid Array 450 cab which is controlled via a KZPSA on an
Alphaserver 1000A 5/333
The Disks are all in single Raid level 5 stripe set and under ADVFS,
using DU 4.0B and with a recent DU patch kit installed.
Data (around 150Mb/day is written just once and is then read many,many
times. I have about 3Gb's of data stored so far and to date have had no
problems.

Jeff.


******************************************************************

Nick Hill

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DCI, Rutherford Appleton Laboratory, Tel: +44 (0)1235-445598
Chilton, Didcot, Oxon, OX11 0QX, England. Fax: +44 (0)1235-446626

N.M.Hill_at_rl.ac.uk http://www.cis.rl.ac.uk/people/nmh1/contact.html
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Received on Wed Jun 11 1997 - 18:01:51 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT