SUMMARY: How to locate devices from errlog entries

From: <jreed_at_wukon.appliedtheory.com>
Date: Thu, 21 Oct 1999 11:06:11 -0400

I asked how to locate hsz devices from errlog entries. I got several
good responses, many thanks.
                ---------------------------------------
alan_at_nabeth.cxo.dec.com reminded us of "scu", and describes the formula
that the naming convention derives from:

        1. Do a "scu show edt bus 1". This will show everything
            on that bus that the host is aware of. If you're
            getting error from devices that don't exist, either
            they're errors not associated with a particular
            device (controller error) or a sick controller.

        2. If the devices are named according to convention:

                /dev/[r]rz[a-h]#{a-h}

            the first a-h is a letter associated with the logical
            number of the device:

                a = LUN 0
                b = LUN 1
                c = LUN 2
                etc.

            Often, "a" won't be used, since can use the simplier
            naming convention; /dev/[r]rz#{a-h}.

            The '#' is the number found from the bus and target
            number using:

                # = (bus * 8) + target

            The last a-h is just the partition letter and has
            nothing to do with SCSI addressing.

        So, bus 1, target 1, LUN 2 would be:

                /dev/[r]rzc9?
                -----------------------------------------
Strangely enough, after disconnecting the tri-link from the array "scu"
still shows devices on this array on two of the four systems attached
thru the tri-link?
                -----------------------------------------
kcarlson_at_arsc.edu told us about a utility to map device files by bus
and target, called "uakmknod" in the "uaio" kit:

  ftp://ftp.alaska.edu/pub/sois/Overview.html#uaio
for the man page:
  http://www.arsc.edu/~kcarlson/software/man/uakmknod.html
                ------------------------------------------
Mireille.BOF_at_univ-nancy2.fr pointed us to the boot info in the
error log and in the messages file, where devices that were seen
at boot are translated into bus/target/lun:

                        scsi1 at isp1 slot 0
                        rz8 at scsi1 target 0 lun 0 (LID=4)
                        _(DEC HSZ70 V71Z)
                        _(Wide16)
                        rzc9 at scsi1 target 1 lun 2 (LID=14)
                        _(DEC HSZ70CCL V71Z)
                        _(Wide16)

                ----------------------------------------
kcarlson_at_arsc.edu made some very interesting comments that may be
relevant to our situation:

1. we had errors on each machine on bus x/target 7/lun 7 - he says:

Unless something has changed since 4.0b (doubtful), target
7 is the controller. With Digital, the highest number is the
highest priority "thing" on the bus... 7 is the highest.

2. Our hosts were all erroring when they were sharing the bottom half
of the hsz (though we do this routinely with 3 other arrays, each
connected to multiple systems, with no similar problems) and he notes:
                        ***********
It used to be "unsupported" for multiple systems to share
a bus via tri-links or otherwise unless one was also running
ASE. It worked, sort of, but was prone to obscure problems.
Any system generating an error on the bus will affect all systems.
Any system or device not on same|compatable firmware will cause
problems for all.

The device errors on one system were probably sufficient to cause
systems which didn't own the device to generate bus errors... they
didn't own the device so they didn't know why the error was being
reported to them... that in turn causes an error (unless probably
the ASE part is running to resolve that).
                        ************
Received on Thu Oct 21 1999 - 15:08:05 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT