Managers,
Following a previous post, regarding Firmware updates I have a unfortunately
quite open question regarding SCSI CAM Errors and HSG80s. We have several
clusters running V5.1A PK1 and HSG80 pairs. We are having a large random
number of SCSI CAM error as below (283 in approx 30 secs). With the HSG80
restarting itself, see below.
Compaq have suggested that it could be the Revision of the HSGs, V86-F4 and
we should go to V86-F10,
although one of the machines is at V86-F8 and F9 and F10 don't sound
relevant.
Has anyone had similar HSG80 environments with large random number of SCSI
CAM Errors??, and HSG80 restarting automatically.
Thanks in advance,
Carl.
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 10595.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Tue Oct 8 15:00:29 2002
OCCURRED ON SYSTEM dev-ds2-2
SYSTEM ID x000D0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
----- UNIT INFORMATION -----
CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # xFFFFFFFE
FMU> show last most
Last Failure Entry: 14. Flags: 006FF901
Template: 1.(01) Description: Last Failure Event
Occurred on 08-OCT-2002 at 13:38:35
Power On Time: 0. Years, 243. Days, 7. Hours, 20. Minutes, 8. Seconds
Controller Model: HSG80
Serial Number: ZG13802212 Hardware Version: E16(2E)
Software Version: V86F-8(BA)
Instance Code: 0102030A Description:
An unrecoverable software inconsistency was detected or an intentional
restart or shutdown of controller operation was requested.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 2.(02)
Event Threshold: 10.(0A) Classification:
SOFT. An unexpected condition detected by a controller software component
(e.g., protocol violations, host buffer access errors, internal
inconsistencies, uninterpreted device errors, etc.) or an intentional
restart or shutdown of controller operation is indicated.
Last Failure Code: 64030104
Last Failure Parameter[0.] C0E6B7B0
Last Failure Parameter[1.] 80EA0E14
Last Failure Parameter[2.] 0000010C
Last Failure Parameter[3.] 80EBA614
Last Failure Code: 64030104 Description:
A DD is already in use by a RCV DIAG command - cannot get two RCV_DIAGs
without sending the data for the first.
> Last Failure Parameter[0] contains DD_PTR.
> Last Failure Parameter[1] contains blocking HTB_PTR.
> Last Failure Parameter[2] contains HTB_PTR flags.
> Last Failure Parameter[3] contains this HTB_PTR.
Reporting Component: 100.(64) Description:
SCSI Host Value Added Services
Reporting component's event number: 3.(03)
Restart Type: 0.(00) Description: Full software restart
AND
Last Failure Entry: 5. Flags: 006FF901
Template: 1.(01) Description: Last Failure Event
Occurred on 29-SEP-2002 at 14:24:04
Power On Time: 0. Years, 89. Days, 1. Hours, 58. Minutes, 19. Seconds
Controller Model: HSG80
Serial Number: ZG04404283 Hardware Version: E12(2A)
Software Version: V86F-4(BA)
Instance Code: 01010302 Description:
An unrecoverable hardware detected fault occurred.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 1.(01)
Event Threshold: 2.(02) Classification:
HARD. Failure of a component that affects controller performance or
precludes access to a device connected to the controller is indicated.
Last Failure Code: 01942088
Last Failure Parameter[0.] 17FFFFFF
Last Failure Parameter[1.] 06DAFFF0
Last Failure Parameter[2.] 7F036FFF
Last Failure Parameter[3.] 00E8FFF4
Last Failure Parameter[4.] 170003C8
Last Failure Parameter[5.] 00021020
Last Failure Parameter[6.] 170003C8
Last Failure Parameter[7.] 80EA8174
Last Failure Code: 01942088 Description:
An error has occurred on the PDAL.
> Last Failure Parameter[0] contains the value of read diagnostic
register 0.
> Last Failure Parameter[1] contains the value of read diagnostic
register 1.
> Last Failure Parameter[2] contains the value of write diagnostic
register 0.
> Last Failure Parameter[3] contains the value of write diagnostic
register 1.
> Last Failure Parameter[4] contains the IBUS address of error register.
> Last Failure Parameter[5] contains the PCFX PDAL control / status
register.
> Last Failure Parameter[6] contains the previous PDAL address of error
register.
> Last Failure Parameter[7] contains the current PDAL address of error
register.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 148.(94)
Restart Type: 0.(00) Description: Full software restart
Active Thread: HP_MAIN I960 Priority: 31.(1F)
Interrupt Stack Guard is intact
NULL Thread Stack Guard is intact
Thread Stack Guard State Flags (ID# Bit; 0=intact,1=not intact): 00000000
Carl Bavington
Development DBA
First Floor, The Icon, Stevenage.
Tel: 01438 36(3169)
Mob: 07973 233957
Received on Tue Oct 08 2002 - 15:56:40 NZDT