TCR(ASE) loses SCSI ping to partner

From: C.Ruhnke <i769646_at_smrs013a.mdc.com>
Date: Wed, 19 May 1999 12:21:43 -0500 (CDT)

This is a wierd one that I have sent to DEQ Support, but I am polling the
membership for possible common problems.

I have a 2-node TruCluster Available Server with 2 AS8200's and 3 shared
SCSI busses (KZPSA-??) which share 2 dual-redundant HSZ40 pairs and one
dual-redundant HSZ50 pair. The HSZ's run RAID-5 sets of 4GB and 9GB disks.

Periodically, something happens and I start getting messages out of TCR of
the form:

smrs013a HSM ***ALERT: network ping to host smrs014a is working but SCSI ping is not

If I run "scu sho edt" on smrs013a I can see the RAIDset devices but the
SCSI controllers on smrs014a are missing. Run it on smrs014a and I see
everything that should be there. Since ALL three SCSI busses are missing,
I do not think it is a rardware issue with the KZPSA.

"dia" on smrs013a reports errors:

Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 8179.
Timestamp of occurrence 19-MAY-1999 07:42:53
Host name smrs013

System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x0000000C

Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 199. CAM SCSI Event Type


------- Unit Info -------
Bus Number 4.
Unit Number x013F Target = 7.
                                     LUN = 7.
------- CAM Data -------
Class x1F Unknown Class
Subsystem x1F Unknown Subsystem
Number of Packets 5.

------ Packet Type ------ 258. Module Name String

Routine Name tmv2_notify_cbf

------ Packet Type ------ 256. Generic String

                                     resource unavailable on bus 4
                                       

------ Packet Type ------ 261. Soft Error String

Error Type Soft Error Detected (recovered)

------ Packet Type ------ 256. Generic String

                                     Active CCB at time of error

------ Packet Type ------ 52. Unknown Packet Type
Packet Revision 76.
** PACKET UNSUPPORTED **
========================================================================

NOTE! Target 7 LUN 7 on bus 4 is the SCSI controller on THIS node (smrs013a).

"dia" on smrs014a reports errors for the same device (Bus 4, Target 7, LUN 7):

Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 347.
Timestamp of occurrence 19-MAY-1999 07:44:32
Host name smrs014

System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x0000000D

Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 199. CAM SCSI Event Type


------- Unit Info -------
Bus Number 4.
Unit Number x013F Target = 7.
                                     LUN = 7.
------- CAM Data -------
Class x1F Unknown Class
Subsystem x1F Unknown Subsystem
Number of Packets 6.

------ Packet Type ------ 258. Module Name String

Routine Name targ_send_comp

------ Packet Type ------ 256. Generic String

                                     Max SEND SCSI BUSY retries exhausted

------ Packet Type ------ 261. Soft Error String

Error Type Soft Error Detected (recovered)

------ Packet Type ------ 256. Generic String

                                     Active CCB at time of error

------ Packet Type ------ 1. SCSI I/O Request CCB(CCB_SCSIIO)
Packet Revision 76.

CCB Address xFFFFFC005254EC80
CCB Length x00C0
XPT Function Code x01 Execute requested SCSI I/O
CAM Status x04 CCB Request Completed WITH Error
Path ID 4.
Target ID 7.
Target LUN 7.
CAM Flags x00001480 Data Direction (10: DATA OUT)
                                     Disable the SIM Queue Frozen State
                                     Place CCB at head of SIM Queue
*pdrv_ptr xFFFFFC005254E928
*next_ccb x0000000000000000
*req_map x0000000000000000
void (*cam_cbfcnp)() xFFFFFC0000588CD0
*data_ptr xFFFFFC001BE28460
Data Transfer Length 140.
*sense_ptr xFFFFFC005254E950
Auotsense Byte Length 164.
CDB Length 6.
Scatter/Gather Entry Cnt 0.
SCSI Status x08 Busy
Autosense Residue Length x00
Transfer Residue Length x0000008C
(CDB) Command & Data Buf

          15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
 0000: 00000000 0000018C 0000E00A * ............*

Timeout Value x00000005
*msg_ptr x0000000000000000
Message Length 0.
Vendor Unique Flags x0000
Tag Queue Actions x00

------ Packet Type ------ 768. SCSI Sense Data
Packet Revision 0.

Error Code x00 Error Code not decoded
Segment # x00
Information Byte 3 x00
            Byte 2 x00
            Byte 1 x00
            Byte 0 x00
Sense Key x00 No Sense
Additional Sense Length x00
CMD Specific Info Byte 3 x00
                  Byte 2 x00
                  Byte 1 x00
                  Byte 0 x00
ASC & ASCQ x0000 ASC = x0000
                                     ASCQ = x0000
                                     No Additional Sense Information
FRU Code x00
Sense Key Specific Byte 0 x00 Sense Key Data NOT Valid
                   Byte 1 x00
                   Byte 2 x00

Addition Sense Data Size Allocated by Driver

Count of valid bytes: 150.


          15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
 0000: 00000000 00000000 00000000 00000000 *................*
 0010: 00000000 00000000 00000000 00000000 *................*
 0020: 00000000 00000000 00000000 00000000 *................*
 0030: 00000000 00000000 00000000 00000000 *................*
 0040: 00000000 00000000 00000000 00000000 *................*
 0050: 00000000 00000000 00000000 00000000 *................*
 0060: 00000000 00000000 00000000 00000000 *................*
 0070: 00000000 00000000 00000000 00000000 *................*
 0080: 00000000 00000000 00000000 00000000 *................*
 0090: 00000000 7E250000 00000000 00000000 *..........%~<^..*

======================================================================

Any of you TruCluster Wizards ever seen anything like this? Any ideas?

TIA!

--CHRis

-
=============================================================================
Chris H. Ruhnke Phone: (314)233-7314
IBM Global Services M/S S306-6340 FAX : (314)234-2262
325 J.S. McDonnell Blvd Email: Ruhnke_at_US.ibm.com
Hazelwood, MO 63042
Received on Wed May 19 1999 - 17:24:19 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT