This is a wierd one that I have sent to DEQ Support, but I am polling the
membership for possible common problems.
I have a 2-node TruCluster Available Server with 2 AS8200's and 3 shared
SCSI busses (KZPSA-??) which share 2 dual-redundant HSZ40 pairs and one
dual-redundant HSZ50 pair. The HSZ's run RAID-5 sets of 4GB and 9GB disks.
Periodically, something happens and I start getting messages out of TCR of
the form:
smrs013a HSM ***ALERT: network ping to host smrs014a is working but SCSI ping is not
If I run "scu sho edt" on smrs013a I can see the RAIDset devices but the
SCSI controllers on smrs014a are missing. Run it on smrs014a and I see
everything that should be there. Since ALL three SCSI busses are missing,
I do not think it is a rardware issue with the KZPSA.
"dia" on smrs013a reports errors:
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 8179.
Timestamp of occurrence 19-MAY-1999 07:42:53
Host name smrs013
System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x0000000C
Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 4.
Unit Number x013F Target = 7.
LUN = 7.
------- CAM Data -------
Class x1F Unknown Class
Subsystem x1F Unknown Subsystem
Number of Packets 5.
------ Packet Type ------ 258. Module Name String
Routine Name tmv2_notify_cbf
------ Packet Type ------ 256. Generic String
resource unavailable on bus 4
------ Packet Type ------ 261. Soft Error String
Error Type Soft Error Detected (recovered)
------ Packet Type ------ 256. Generic String
Active CCB at time of error
------ Packet Type ------ 52. Unknown Packet Type
Packet Revision 76.
** PACKET UNSUPPORTED **
========================================================================
NOTE! Target 7 LUN 7 on bus 4 is the SCSI controller on THIS node (smrs013a).
"dia" on smrs014a reports errors for the same device (Bus 4, Target 7, LUN 7):
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 347.
Timestamp of occurrence 19-MAY-1999 07:44:32
Host name smrs014
System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x0000000D
Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 4.
Unit Number x013F Target = 7.
LUN = 7.
------- CAM Data -------
Class x1F Unknown Class
Subsystem x1F Unknown Subsystem
Number of Packets 6.
------ Packet Type ------ 258. Module Name String
Routine Name targ_send_comp
------ Packet Type ------ 256. Generic String
Max SEND SCSI BUSY retries exhausted
------ Packet Type ------ 261. Soft Error String
Error Type Soft Error Detected (recovered)
------ Packet Type ------ 256. Generic String
Active CCB at time of error
------ Packet Type ------ 1. SCSI I/O Request CCB(CCB_SCSIIO)
Packet Revision 76.
CCB Address xFFFFFC005254EC80
CCB Length x00C0
XPT Function Code x01 Execute requested SCSI I/O
CAM Status x04 CCB Request Completed WITH Error
Path ID 4.
Target ID 7.
Target LUN 7.
CAM Flags x00001480 Data Direction (10: DATA OUT)
Disable the SIM Queue Frozen State
Place CCB at head of SIM Queue
*pdrv_ptr xFFFFFC005254E928
*next_ccb x0000000000000000
*req_map x0000000000000000
void (*cam_cbfcnp)() xFFFFFC0000588CD0
*data_ptr xFFFFFC001BE28460
Data Transfer Length 140.
*sense_ptr xFFFFFC005254E950
Auotsense Byte Length 164.
CDB Length 6.
Scatter/Gather Entry Cnt 0.
SCSI Status x08 Busy
Autosense Residue Length x00
Transfer Residue Length x0000008C
(CDB) Command & Data Buf
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000000 0000018C 0000E00A * ............*
Timeout Value x00000005
*msg_ptr x0000000000000000
Message Length 0.
Vendor Unique Flags x0000
Tag Queue Actions x00
------ Packet Type ------ 768. SCSI Sense Data
Packet Revision 0.
Error Code x00 Error Code not decoded
Segment # x00
Information Byte 3 x00
Byte 2 x00
Byte 1 x00
Byte 0 x00
Sense Key x00 No Sense
Additional Sense Length x00
CMD Specific Info Byte 3 x00
Byte 2 x00
Byte 1 x00
Byte 0 x00
ASC & ASCQ x0000 ASC = x0000
ASCQ = x0000
No Additional Sense Information
FRU Code x00
Sense Key Specific Byte 0 x00 Sense Key Data NOT Valid
Byte 1 x00
Byte 2 x00
Addition Sense Data Size Allocated by Driver
Count of valid bytes: 150.
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000000 00000000 00000000 00000000 *................*
0010: 00000000 00000000 00000000 00000000 *................*
0020: 00000000 00000000 00000000 00000000 *................*
0030: 00000000 00000000 00000000 00000000 *................*
0040: 00000000 00000000 00000000 00000000 *................*
0050: 00000000 00000000 00000000 00000000 *................*
0060: 00000000 00000000 00000000 00000000 *................*
0070: 00000000 00000000 00000000 00000000 *................*
0080: 00000000 00000000 00000000 00000000 *................*
0090: 00000000 7E250000 00000000 00000000 *..........%~<^..*
======================================================================
Any of you TruCluster Wizards ever seen anything like this? Any ideas?
TIA!
--CHRis
-
=============================================================================
Chris H. Ruhnke Phone: (314)233-7314
IBM Global Services M/S S306-6340 FAX : (314)234-2262
325 J.S. McDonnell Blvd Email: Ruhnke_at_US.ibm.com
Hazelwood, MO 63042
Received on Wed May 19 1999 - 17:24:19 NZST