SUMMARY: Mysterious SCSI problems

From: Erik Persson <erik_at_ikp.liu.se>
Date: Tue, 01 Sep 1998 16:57:51 +0200

My problem has now been solved, very much thanks to advice from this
list and Digital support.

With help from George Michaelson <ggm_at_dstc.edu.au> who had a very
similar problem with the same kind of hardware (AS800, KZPAA and TZ88
drives) - in fact, his solution was the one I used. It turned out that
it was the KZPAA that didn't get along with the TZ887 (possibly in
conjuction with the AS800).

The KZPAA has now been replaced by a ISP1040-based KZPBA-CA card and
everything just works great.

All replies are included at the bottom of this post.

My Original post:

> I recently migrated all my NFS server functions from an AlphaServer
> 800, leaving only some internal disks and an external TZ887 loader.
> Prior to this operation, the machine has been doing local and network
> backups with Digital NSR 4.4 on the very DLT loader mentioned above
> without any problems whatsoever. The idea is to use this machine
> exclusively as a NetWorker server.
>
> The problem is that this setup after being stripped does not longer
> function properly when it comes to running backups. What happens is
> that the loader device gets recognized properly and can be
> maniplulated from within networker as far as loading, labeling and
> mounting tapes is concerned, but when it comes to writing more data
> than just a tape label - for instance when trying actually to run a
> backup trouble arrives.
>
> What happens is that a few MB (or sometimes just a few KB) gets
> written to tape and then all writing stops and everything hangs for a
> couple of minutes until Networker reports an I/O error and proceeds to
> load the next tape which also suffers from that behaviour. It seems as
> if the SCSI bus or driver just hangs.
>
> I have tried several combinations of cables, terminations, SCSI
> controllers and tapes and I even went so far as connecting a
> standalone DLT4000 drive which also showed the same symptoms.
>
> The machine was previously configured as follows:
>
> AS800---+-- Internal ISP1020 -- RZ28 - RC1CB - RZ1CB - Fujitsu MAB3091
> |
> +-- External KZPAA (NCR810) -- TZ887 -- RZ29B -- RZ29B -- RZ29B
> |
> +-- External KZPAA (NCR810) -- RZ29B -- RZ29B -- RZ29B
> |
> +-- External KZPAA (NCR810) -- RZ29B -- RZ29B -- RZ29B
>
> All external RZ29:s were in StorageWorks shelves. All disks except the
> RZ28 (the system disks) were used in LSM stripes. Yes, I know that the
> NCR810 sucks big time. Now, it looks like:
>
> AS800---+-- Internal ISP1020 -- RZ28 - RC1CB - RZ1CB
> |
> +-- External KZPAA (NCR810) -- TZ887 -- Terminator
> |
> +-- External KZPAA (NCR810) -- Terminator
> |
> +-- External KZPAA (NCR810) -- Terminator
>
> The operating system is Digital Unix 4.0D, patch kit #1. This is the
> uerf log from the I/O error business:
>
> ********************************* ENTRY 20. *********************************
>
> ----- EVENT INFORMATION -----
>
> EVENT CLASS ERROR EVENT
> OS EVENT TYPE 199. CAM SCSI
> SEQUENCE NUMBER 2.
> OPERATING SYSTEM DEC OSF/1
> OCCURRED/LOGGED ON Fri Aug 28 19:35:24 1998
> OCCURRED ON SYSTEM dylan
> SYSTEM ID x0007001B
> SYSTYPE x00000000
>
> ----- UNIT INFORMATION -----
>
> CLASS x0022 DEC SIM
> SUBSYSTEM x0000 DISK
> BUS # x0002
> x00A8 LUN x0
> TARGET x5
>
> ********************************* ENTRY 21. *********************************
>
> ----- EVENT INFORMATION -----
>
> EVENT CLASS ERROR EVENT
> OS EVENT TYPE 199. CAM SCSI
> SEQUENCE NUMBER 3.
> OPERATING SYSTEM DEC OSF/1
> OCCURRED/LOGGED ON Fri Aug 28 19:35:24 1998
> OCCURRED ON SYSTEM dylan
> SYSTEM ID x0007001B
> SYSTYPE x00000000
>
> ----- UNIT INFORMATION -----
>
> CLASS x0022 DEC SIM
> SUBSYSTEM x0000 DISK
> BUS # x0002
> x00A8 LUN x0
> TARGET x5
>
> ********************************* ENTRY 22. *********************************
>
> ----- EVENT INFORMATION -----
>
> EVENT CLASS ERROR EVENT
> OS EVENT TYPE 199. CAM SCSI
> SEQUENCE NUMBER 4.
> OPERATING SYSTEM DEC OSF/1
> OCCURRED/LOGGED ON Fri Aug 28 19:35:24 1998
> OCCURRED ON SYSTEM dylan
> SYSTEM ID x0007001B
> SYSTYPE x00000000
>
> ----- UNIT INFORMATION -----
>
> CLASS x0001 TAPE
> SUBSYSTEM x0000 DISK
> BUS # x0002
> x00A8 LUN x0
> TARGET x5
>
>
> Any suggestions?
---
>From: alan_at_nabeth.cxo.dec.com
>Subject: Re: Mysterious SCSI problems (ADDENDUM)
>To: Erik Persson <erik_at_ikp.liu.se>
>Date: Fri, 28 Aug 98 13:33:42 -0600
>
>
>	 They're SCSI protocol errors.  Check the cables, connections
>	 and termination again.
---
>From: George Michaelson <ggm_at_dstc.edu.au>
>Subject: tape failure
>To: erik_at_ikp.liu.se
>Date: Mon, 31 Aug 1998 11:09:22 +1000 (EST)
>
>
>we never got it to work. We forced DEC to swap the KZP-AA for a DA or
>similar, and upgraded to fast/wide tapedrives.
>
>There is a firmware level problem and/or a data rate mismatch which
>DEC can't fix and seem to have trouble admitting to.
>
>so you need fast/wide/differential scsi on the tape.
>
>-george
---
>From: George Michaelson <ggm_at_dstc.edu.au>
>Subject: Re: tape failure
>To: Erik Persson <erik_at_ikp.liu.se>
>Date: Tue, 01 Sep 1998 08:54:38 +1000
>
>
>there were substantial differences. We were on an 800 5/333 with
>hardware RAID in a BA enclosure, matched 9Gb spindles. the tapes
>were TZ87 and then TZ88 on the KZPAA.
>
>we found that dumps worked very very intermittantly, and then failed
>almost trivially. any tape operation streaming data flooded things.
>
>We were loaned a differential scsi controller and a single TZ89 which
>worked, and then we persuaded DEC (who specced the original system)
>to replace the TZ87's with a pair of TZ88's on fast/wide and a
>suitable
>controller. the problem went away.
---
-- 
Erik Persson, System Manager            e-mail: erik_at_ikp.liu.se
Dept. of Mech. Engineering              Voice: +46 13 28 2464
University of Linköping, Sweden		Fax:   +46 13 21 2717
Received on Tue Sep 01 1998 - 15:01:05 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT