DU 4.0E and ASE 1.5 Proper configured?

From: alphaadmin <alphaadmin_at_bcinetwork.com>
Date: Mon, 28 Jan 2002 14:37:00 -0500

I am attempting to configure an ASE cluster utilizing two AS4100
5/600 servers connected to a RA7000. One server (Server A) and
the attached RaidArray have been in production for several years.
The second server (Server B) and associated ASE licensing was
purchased about a year after Server A went into production, but
the cluster was not configured nor connected back then.

Both servers have 4.0E with the latest patch kit, as well as the
latest SRM and AlphaBIOS firmware applied. (I am aware that 4.0E
is no longer a supported version at Compaq. I’m looking to this
summer for getting it upgraded).

The Host Bus Adapters in each server, for the shared SCSI bus, is
the KZPBA-CB. One is set to SCSI ID 7, the other to ID 6.

The RA7000 is configured into several ADVFS domains. All RA7000
logical disks are under the ASE services control with 2
exceptions: 1) I am temporarily using an unused logical disk for
additional swap space on the server that is currently in
production, but only until an additional hard drive that is on
order arrives and can be installed into that server’s unshared
disk shelf. 2) I have one ADVFS domain, containing 5 ADVFS sets
that are mounted to the production server through entries in the
fstab.

My problem is that after I configured the ASE to this level and
Server B is booted up and operating alongside Server A, I am
getting frequent CAM SCSI errors logged on Server A. These seem
to occur in batches, registered against the different Target/LUNs
on the array, every 3-5 minutes while Server B is running. The
asemgr shows that the two members see each other, and I am able to
fail-over services seemingly fine. Server B sees CAM errors only
when I boot up Server A after Server B is operating. Please see
below for a sampling of the dia output containing CAM messages.

Assuming this frequency of CAM errors is irregular (they did not
occur before bringing ASE into the picture), I began researching
what might be “wrong” with this configuration. This has brought
me to the following questions, which I hope you will be able to
help me answer.

1) The KZPBA-CB is not listed as a supported adapter in the
ASE 1.5 hardware list (I did find it in the supported hardware
list for 4.0F/1.6). Is this controller supported in my
configuration? Could this be the source of the problems I am
having?

2) Must every logical disk on the shared SCSI bus be under
the control of the ASE? Could the filesystems that are not under
ASE and/or the Swap partition be triggering the CAM errors?

3) Is there something else about my configuration causing the
problems?

Thank you for your assistance on this,

Jeff Roberts
alphaadmin_at_bcinetwork.com

--- output from dia ---
******************************** ENTRY 3
*****************************


Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 64.
Timestamp of occurrence 25-JAN-2002 20:45:10
Host name alpha

System type register x00000016 Alpha 4000/1200 Series
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000003

Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 199. CAM SCSI Event Type


------- Unit Info -------
Bus Number 16.
Unit Number x0402 Target = 0.
                                     LUN = 2.
------- CAM Data -------
Class x00 Disk
Subsystem x00 Disk
Number of Packets 10.

------ Packet Type ------ 258. Module Name String

Routine Name cdisk_rec_status

------ Packet Type ------ 256. Generic String

                                     Recovery progress event, this
is NOT an
                                     error

------ Packet Type ------ 262. Info Error String

Error Type Information Message Detected
(recovered)

------ Packet Type ------ 257. Device Name String

Device Name DEC HSZ70 V71Z

------ Packet Type ------ 256. Generic String

                                     Active CCB at time of error

------ Packet Type ------ 256. Generic String

                                     CCB request completed with an
error

------ Packet Type ------ 1. SCSI I/O Request CCB
(CCB_SCSIIO)
Packet Revision 76.

CCB Address xFFFFFC0114457E80
CCB Length x00C0
XPT Function Code x01 Execute requested SCSI I/O
CAM Status x84 CCB Request Completed WITH
Error
                                     Autosense Data Valid for
Target
Path ID 16.
Target ID 0.
Target LUN 2.
CAM Flags x000054C0 Data Direction (11: no data)
                                     Disable the SIM Queue Frozen
State
                                     Place CCB at head of SIM
Queue
                                     Attempt Sync Data Xfer - SDTR
*pdrv_ptr xFFFFFC0114457B28
*next_ccb x0000000000000000
*req_map x0000000000000000
void (*cam_cbfcnp)() xFFFFFC0000578170
*data_ptr x0000000000000000
Data Transfer Length 0.
*sense_ptr xFFFFFC0114457B50
Auotsense Byte Length 160.
CDB Length 6.
Scatter/Gather Entry Cnt 0.
SCSI Status x02 Check Condition
Autosense Residue Length x8E
Transfer Residue Length x00000000
(CDB) Command & Data Buf

          15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
 0000: 00000000 00000000 00000000
* ............*

Timeout Value x00000014
*msg_ptr x0000000000000000
Message Length 0.
Vendor Unique Flags x0000
Tag Queue Actions x00

------ Packet Type ------ 256. Generic String

                                     Error, exception, or abnormal
condition

------ Packet Type ------ 256. Generic String

                                     UNIT ATTENTION - Medium
changed or target
                                     reset

------ Packet Type ------ 768. SCSI Sense Data
Packet Revision 0.

------- HSx Data -------

Error Code x70 Current Error
Segment # x00
Information Byte 3 x00
            Byte 2 x00
            Byte 1 x00
            Byte 0 x00
Sense Key x06 Unit Attention
Additional Sense Length x0A
CMD Specific Info Byte 3 x00
                  Byte 2 x00
                  Byte 1 x00
                  Byte 0 x00
ASC & ASCQ x2900 ASC = x0029
                                     ASCQ = x0000
                                     Power On, Reset, or Bus
Device Reset
                                     Occurred
FRU Code x00
Sense Key Specific Byte 0 x00 Sense Key Data NOT Valid
                   Byte 1 x00
                   Byte 2 x00

Count of valid bytes: 142.


          15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
 0000: 00000000 00000000 00000000 00000000
*................*
 0010: 00000000 00000000 00000000 00000000
*................*
 0020: 00000000 00000000 00000000 00000000
*................*
 0030: 00000000 00000000 00000000 00000000
*................*
 0040: 00000000 00000000 00000000 00000000
*................*
 0050: 00000000 00000000 00000000 00000000
*................*
 0060: 00000000 00000000 00000000 00000000
*................*
 0070: 00000000 00000000 00000000 00000000
*................*
 0080: 00000000 00000000 00000000 00000000
*................*



******************************** ENTRY 4
*****************************


Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 63.
Timestamp of occurrence 25-JAN-2002 20:45:10
Host name alpha

System type register x00000016 Alpha 4000/1200 Series
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000001

Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 199. CAM SCSI Event Type


------- Unit Info -------
Bus Number 16.
Unit Number x0408 Target = 1.
                                     LUN = 0.
------- CAM Data -------
Class x00 Disk
Subsystem x00 Disk
Number of Packets 10.

------ Packet Type ------ 258. Module Name String

Routine Name cdisk_rec_status

------ Packet Type ------ 256. Generic String

                                     Recovery progress event, this
is NOT an
                                     error

------ Packet Type ------ 262. Info Error String

Error Type Information Message Detected
(recovered)

------ Packet Type ------ 257. Device Name String

Device Name DEC HSZ70 V71Z

------ Packet Type ------ 256. Generic String

                                     Active CCB at time of error

------ Packet Type ------ 256. Generic String

                                     CCB request completed with an
error

------ Packet Type ------ 1. SCSI I/O Request CCB
(CCB_SCSIIO)
Packet Revision 76.

CCB Address xFFFFFC013FE15580
CCB Length x00C0
XPT Function Code x01 Execute requested SCSI I/O
CAM Status x84 CCB Request Completed WITH
Error
                                     Autosense Data Valid for
Target
Path ID 16.
Target ID 1.
Target LUN 0.
CAM Flags x000054C0 Data Direction (11: no data)
                                     Disable the SIM Queue Frozen
State
                                     Place CCB at head of SIM
Queue
                                     Attempt Sync Data Xfer - SDTR
*pdrv_ptr xFFFFFC013FE15228
*next_ccb x0000000000000000
*req_map x0000000000000000
void (*cam_cbfcnp)() xFFFFFC0000578170
*data_ptr x0000000000000000
Data Transfer Length 0.
*sense_ptr xFFFFFC013FE15250
Auotsense Byte Length 160.
CDB Length 6.
Scatter/Gather Entry Cnt 0.
SCSI Status x02 Check Condition
Autosense Residue Length x8E
Transfer Residue Length x00000000
(CDB) Command & Data Buf

          15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
 0000: 00000000 00000000 00000000
* ............*

Timeout Value x00000014
*msg_ptr x0000000000000000
Message Length 0.
Vendor Unique Flags x0000
Tag Queue Actions x00

------ Packet Type ------ 256. Generic String

                                     Error, exception, or abnormal
condition

------ Packet Type ------ 256. Generic String

                                     UNIT ATTENTION - Medium
changed or target
                                     reset

------ Packet Type ------ 768. SCSI Sense Data
Packet Revision 0.

------- HSx Data -------

Error Code x70 Current Error
Segment # x00
Information Byte 3 x00
            Byte 2 x00
            Byte 1 x00
            Byte 0 x00
Sense Key x06 Unit Attention
Additional Sense Length x0A
CMD Specific Info Byte 3 x00
                  Byte 2 x00
                  Byte 1 x00
                  Byte 0 x00
ASC & ASCQ x2900 ASC = x0029
                                     ASCQ = x0000
                                     Power On, Reset, or Bus
Device Reset
                                     Occurred
FRU Code x00
Sense Key Specific Byte 0 x00 Sense Key Data NOT Valid
                   Byte 1 x00
                   Byte 2 x00

Count of valid bytes: 142.


          15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
 0000: 00000000 00000000 00000000 00000000
*................*
 0010: 00000000 00000000 00000000 00000000
*................*
 0020: 00000000 00000000 00000000 00000000
*................*
 0030: 00000000 00000000 00000000 00000000
*................*
 0040: 00000000 00000000 00000000 00000000
*................*
 0050: 00000000 00000000 00000000 00000000
*................*
 0060: 00000000 00000000 00000000 00000000
*................*
 0070: 00000000 00000000 00000000 00000000
*................*
 0080: 00000000 00000000 00000000 00000000
*................*


 

________________________________________________________________
Sent via the KillerWebMail system at bcinetwork.com


 
                   


 

________________________________________________________________
Sent via the KillerWebMail system at bcinetwork.com


 
                   
Received on Mon Jan 28 2002 - 19:29:11 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:43 NZDT