I asked how to locate hsz devices from errlog entries. I got several
good responses, many thanks.
---------------------------------------
alan_at_nabeth.cxo.dec.com reminded us of "scu", and describes the formula
that the naming convention derives from:
1. Do a "scu show edt bus 1". This will show everything
on that bus that the host is aware of. If you're
getting error from devices that don't exist, either
they're errors not associated with a particular
device (controller error) or a sick controller.
2. If the devices are named according to convention:
/dev/[r]rz[a-h]#{a-h}
the first a-h is a letter associated with the logical
number of the device:
a = LUN 0
b = LUN 1
c = LUN 2
etc.
Often, "a" won't be used, since can use the simplier
naming convention; /dev/[r]rz#{a-h}.
The '#' is the number found from the bus and target
number using:
# = (bus * 8) + target
The last a-h is just the partition letter and has
nothing to do with SCSI addressing.
So, bus 1, target 1, LUN 2 would be:
/dev/[r]rzc9?
-----------------------------------------
Strangely enough, after disconnecting the tri-link from the array "scu"
still shows devices on this array on two of the four systems attached
thru the tri-link?
-----------------------------------------
kcarlson_at_arsc.edu told us about a utility to map device files by bus
and target, called "uakmknod" in the "uaio" kit:
ftp://ftp.alaska.edu/pub/sois/Overview.html#uaio
for the man page:
http://www.arsc.edu/~kcarlson/software/man/uakmknod.html
------------------------------------------
Mireille.BOF_at_univ-nancy2.fr pointed us to the boot info in the
error log and in the messages file, where devices that were seen
at boot are translated into bus/target/lun:
scsi1 at isp1 slot 0
rz8 at scsi1 target 0 lun 0 (LID=4)
_(DEC HSZ70 V71Z)
_(Wide16)
rzc9 at scsi1 target 1 lun 2 (LID=14)
_(DEC HSZ70CCL V71Z)
_(Wide16)
----------------------------------------
kcarlson_at_arsc.edu made some very interesting comments that may be
relevant to our situation:
1. we had errors on each machine on bus x/target 7/lun 7 - he says:
Unless something has changed since 4.0b (doubtful), target
7 is the controller. With Digital, the highest number is the
highest priority "thing" on the bus... 7 is the highest.
2. Our hosts were all erroring when they were sharing the bottom half
of the hsz (though we do this routinely with 3 other arrays, each
connected to multiple systems, with no similar problems) and he notes:
***********
It used to be "unsupported" for multiple systems to share
a bus via tri-links or otherwise unless one was also running
ASE. It worked, sort of, but was prone to obscure problems.
Any system generating an error on the bus will affect all systems.
Any system or device not on same|compatable firmware will cause
problems for all.
The device errors on one system were probably sufficient to cause
systems which didn't own the device to generate bus errors... they
didn't own the device so they didn't know why the error was being
reported to them... that in turn causes an error (unless probably
the ASE part is running to resolve that).
************
Received on Thu Oct 21 1999 - 15:08:05 NZDT