Managers,
Following a previous post, regarding Firmware updates I have a unfortunately
quite open question regarding SCSI CAM Errors and HSG80s. We have several
clusters running V5.1A PK1 and HSG80 pairs. We are having a large random
number of SCSI CAM error as below (283 in approx 30 secs). With the HSG80
restarting itself, see below.
Compaq have suggested that it could be the Revision of the HSGs, V86-F4 and
we should go to V86-F10,
although one of the machines is at V86-F8 and F9 and F10 don't sound
relevant.
Has anyone had similar HSG80 environments with large random number of SCSI
CAM Errors??, and HSG80 restarting automatically.
Thanks in advance,
Carl.
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT
OS EVENT TYPE                  199.     CAM SCSI
SEQUENCE NUMBER              10595.
OPERATING SYSTEM                        DEC OSF/1
OCCURRED/LOGGED ON                      Tue Oct  8 15:00:29 2002
OCCURRED ON SYSTEM                      dev-ds2-2
SYSTEM ID                 x000D0022
SYSTYPE                   x00000000
PROCESSOR COUNT                  2.
PROCESSOR WHO LOGGED      x00000000
----- UNIT INFORMATION -----
CLASS                         x0000     DISK
SUBSYSTEM                     x0000     DISK
BUS #                         xFFFFFFFE
FMU> show last most
Last Failure Entry: 14. Flags: 006FF901
 Template: 1.(01) Description: Last Failure Event
 Occurred on 08-OCT-2002 at 13:38:35
 Power On Time: 0. Years, 243. Days, 7. Hours, 20. Minutes, 8. Seconds
 Controller Model: HSG80
 Serial Number: ZG13802212 Hardware Version:  E16(2E)
 Software Version: V86F-8(BA)
 Instance Code: 0102030A Description:
  An unrecoverable software inconsistency was detected or an intentional
  restart or shutdown of controller operation was requested.
 Reporting Component: 1.(01) Description:
  Executive Services
 Reporting component's event number: 2.(02)
 Event Threshold: 10.(0A) Classification:
  SOFT. An unexpected condition detected by a controller software component
  (e.g., protocol violations, host buffer access errors, internal
  inconsistencies, uninterpreted device errors, etc.) or an intentional
  restart or shutdown of controller operation is indicated.
 Last Failure Code: 64030104
  Last Failure Parameter[0.] C0E6B7B0
  Last Failure Parameter[1.] 80EA0E14
  Last Failure Parameter[2.] 0000010C
  Last Failure Parameter[3.] 80EBA614
 Last Failure Code: 64030104 Description:
  A DD is already in use by a RCV DIAG command - cannot get two RCV_DIAGs
  without sending the data for the first.
   > Last Failure Parameter[0] contains DD_PTR.
   > Last Failure Parameter[1] contains blocking HTB_PTR.
   > Last Failure Parameter[2] contains HTB_PTR flags.
   > Last Failure Parameter[3] contains this HTB_PTR.
 Reporting Component: 100.(64) Description:
  SCSI Host Value Added Services
 Reporting component's event number: 3.(03)
 Restart Type: 0.(00) Description: Full software restart
AND
Last Failure Entry: 5. Flags: 006FF901
 Template: 1.(01) Description: Last Failure Event
 Occurred on 29-SEP-2002 at 14:24:04
 Power On Time: 0. Years, 89. Days, 1. Hours, 58. Minutes, 19. Seconds
 Controller Model: HSG80
 Serial Number: ZG04404283 Hardware Version:  E12(2A)
 Software Version: V86F-4(BA)
 Instance Code: 01010302 Description:
  An unrecoverable hardware detected fault occurred.
 Reporting Component: 1.(01) Description:
  Executive Services
 Reporting component's event number: 1.(01)
 Event Threshold: 2.(02) Classification:
  HARD. Failure of a component that affects controller performance or
  precludes access to a device connected to the controller is indicated.
 Last Failure Code: 01942088
  Last Failure Parameter[0.] 17FFFFFF
  Last Failure Parameter[1.] 06DAFFF0
  Last Failure Parameter[2.] 7F036FFF
  Last Failure Parameter[3.] 00E8FFF4
  Last Failure Parameter[4.] 170003C8
  Last Failure Parameter[5.] 00021020
  Last Failure Parameter[6.] 170003C8
  Last Failure Parameter[7.] 80EA8174
 Last Failure Code: 01942088 Description:
  An error has occurred on the PDAL.
   > Last Failure Parameter[0] contains the value of read diagnostic
     register 0.
   > Last Failure Parameter[1] contains the value of read diagnostic
     register 1.
   > Last Failure Parameter[2] contains the value of write diagnostic
     register 0.
   > Last Failure Parameter[3] contains the value of write diagnostic
     register 1.
   > Last Failure Parameter[4] contains the IBUS address of error register.
   > Last Failure Parameter[5] contains the PCFX PDAL control / status
     register.
   > Last Failure Parameter[6] contains the previous PDAL address of error
     register.
   > Last Failure Parameter[7] contains the current PDAL address of error
     register.
 Reporting Component: 1.(01) Description:
  Executive Services
 Reporting component's event number: 148.(94)
 Restart Type: 0.(00) Description: Full software restart
 Active Thread: HP_MAIN I960 Priority: 31.(1F)
 Interrupt Stack Guard is intact
 NULL Thread Stack Guard is intact
 Thread Stack Guard State Flags (ID# Bit; 0=intact,1=not intact): 00000000
Carl Bavington
Development DBA
First Floor, The Icon, Stevenage.
Tel: 01438 36(3169)
Mob: 07973 233957
Received on Tue Oct 08 2002 - 15:56:40 NZDT