Help understanding error: "Impossible Cond Detected"

From: Aldridge, Robert E. <REAldridge_at_mcdermott.com>
Date: Tue, 19 Jun 2001 08:57:27 -0500

Tru64 Managers:

On our ES40 cluster (Tru64 5.1 patch 3), we frequently get disk-related
errors. The errors seem to be generated from the HSG60 disk connection.

I would appreciate some help interpreting the error message (below). The
error mentions a particular scsi target (b=2 t=5 l=0) but that doesn't taget
exist on the ES40. Could the error message refer to the SCSI bus of the
HSG60 (MA6000 array)?

ALSO -- the one ES40 in the two-node cluster crashes every few days. The
crash does NOT occur at the same time of these error messages.


Here's a look at our hardware (hwmgr):


# hwmgr -view dev
 HWID: Device Name Mfg Model Location
 
----------------------------------------------------------------------------
--
    4: /dev/kevm
   51: /dev/disk/floppy0c            3.5in floppy     fdi0-unit-0
   56: /dev/disk/dsk0c      COMPAQ   BF01863644       bus-0-targ-0-lun-0
   57: /dev/disk/dsk1c      COMPAQ   BF01863644       bus-0-targ-1-lun-0
   58: /dev/disk/dsk2c      DEC      HSG60            IDENTIFIER=110
   59: /dev/disk/dsk3c      DEC      HSG60            IDENTIFIER=120
   60: /dev/disk/dsk4c      DEC      HSG60            IDENTIFIER=10
   61: /dev/disk/dsk5c      DEC      HSG60            IDENTIFIER=20
   62: /dev/disk/dsk6c      DEC      HSG60            IDENTIFIER=30
   63: /dev/disk/dsk7c      DEC      HSG60            IDENTIFIER=40
   64: /dev/disk/dsk8c      DEC      HSG60            IDENTIFIER=50
   65: /dev/disk/cdrom0c    COMPAQ   CRD-8402B        bus-3-targ-0-lun-0
   66: /dev/cport/scp0               HSG60CCL         bus-2-targ-0-lun-0
  132: /dev/changer/mc0              TL800    (C) DEC bus-1-targ-0-lun-0
  133: /dev/ntape/tape0     DEC      TZ89     (C) DEC bus-1-targ-4-lun-0
  134: /dev/ntape/tape1     DEC      TZ89     (C) DEC bus-1-targ-5-lun-0
And here is the error we receive ---
>From root_at_saturn.xyz.xyz.com Mon Jun 18 21:40:42 2001
Date: Mon, 18 Jun 2001 21:40:42 -0400 (EDT)
From: system PRIVILEGED account <root_at_saturn.xyz.xyz.com>
Subject: EVM ALERT [700]: SCSI event
Content-Length: 1864
======================= Binary Error Log event =======================
EVM event name: sys.unix.binlog.hw.scsi
    Binary error log events are posted through the binlogd daemon, and
    stored in the binary error log file, /var/adm/binary.errlog.  This
    event is used to report all SCSI device errors, including disk,
    tape, HSZ raid events, and adapter errors.
======================================================================
Formatted Message:
    SCSI event
Event Data Items:
    Event Name        : sys.unix.binlog.hw.scsi
    Priority          : 700
    PID               : 524693
    PPID              : 524289
    Event Id          : 1456
    Member Id         : 1
    Timestamp         : 18-Jun-2001 21:40:42
    Host IP address   : 131.184.3.49
    Cluster IP address: 131.184.3.52
    Host Name         : mars
    Cluster Name      : saturn
    User Name         : root
    Format            : SCSI event
    Reference         : cat:evmexp.cat:300
Variable Items:
    subid_class (INT32) = 199
    subid_num (INT32) = 2
    subid_unit_num (INT32) = 168
    subid_type (INT32) = 0
    binlog_event (OPAQUE) = [OPAQUE VALUE: 856 bytes]
============================ Translation =============================
Sequence number of error: 531235328
Time of error entry: 18-Jun-2001 21:40:42
Host name: mars
SCSI CAM ERROR PACKET
SCSI device class: DISK
Bus Number: 2
Target number: 5
Lun Number: 0
Name of routine that logged the event: cdisk_complete
Event information: Status = CMP but resid not NULL
Software detected event: Possible Software Problem - Impossible Cond
Detected
Device Name: DEC     HSG60           V85L
Event information: Active CCB at time of error
Event information: CCB request completed w/out error
                ############### Entry End ###############
======================================================================
Thanks for any assistance you can provide.  I also have a call open with
Compaq support.
Rob Aldridge
AT&T Solutions
Alliance, Ohio
Received on Tue Jun 19 2001 - 13:58:07 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT