SUMMARY: scsi reservations, disk configurations

From: JEFF 'THE B TEAM' BECK (206)662-4620 BOEING COMMERCIAL AIRPLANE GROUP <Jeff.Beck_at_orcas.iasl.ca.boeing.com>
Date: Thu, 14 Aug 1997 11:07:55 -0800 (PST)

I received one reply to my note and did some additional checking on my own.
As far as the CAM errors, I went further back timewise and uerf reports the
same sorts of messages when a service is moved from one system to the
other. I never saw the errors before (though they were dutifully logged)
because 'voldisk list' doesn't report them and 'file $ls /dev/rrz*2*a' does
report them. Jeff

Thanks to alan_at_nabeth.cxo.dec.com for this response:

        In case I haven't answered this yet...

        Monitor gets the disk I/O stats from the kernel using the
        table(2) system call. If the kernel data structure for a
        given disk shows activity, Monitor collects the data for
        it. Disks which have had data collected for them at any
        time while the system was booted will be reported. On
        V4.0B non-zero LUNs apparently have to be discovered when
        the system boots for data to be collected. I'm not certain
        of this, but it seems that way from what I saw myself this
        week with a HSZ based stripe set.

        re: CAM errors.

        Looking at the CAM driver funny will "freeze the SIM queue".
        The driver and any client software using /dev/cam should
        unfreeze the queue before they continue. The CAM_SIM_QFRZDIS
        is either saying that the queue was frozen or unfrozen.
        The reservation conflict could simply be that the the
        other host has those device reserved and the driver is
        logging errors when you try to access them. The unexpected
        bus free is something I'd have somebody knowledgable look
        at.

ORIGINAL
--------------------------------------------------------------------------
Configuration: 2 X 2100, du3.2d, ase1.3, lsm, advfs

My system was originally set up as an nfs file server using ADVfs, with
each ADVfs 'volume' being an LSM "volume". Each of the LSM volumes
was a single (HSZ40) raid disk. I've decided our configuration would be
a lot simpler if we eliminated LSM from our configuration and handed off
the RAID disks directly to ADVfs. I've been able to modify 2 of the ASE
services and now notice some unsettling/disturbing/unnerving behaviour.

Problem 1) "Extra" disks show up when using monitor. My HSZ40 disks are
             rzb21-rzg21 for bus 2 and rzb29-rzg29 for bus 3. I'm now
             seeing 'rz21' which I don't remember seeing before, and earlier
             this week I was seeing an 'rz29' which seems to have vanished.
             I *think* the rz29 went away when I did a 'voldisk rm rzb29'
             but I'm not certain. The rz21 is especially puzzling since
             the 2 changes I made were with rzb29 and rzg29--i.e. nothing
             on bus 2 changed.

Problem 2) I used "voldisk list" as a quick troubleshooting command to
             tell me which system thought it was controlling which disks.
             I'm switching over to "file -f disks.dat" which displays
             something along the lines of:

/dev/rrzb21a: character special (8/37952) SCSI #2 HSZ40 disk #169 (SCSI ID #5)
/dev/rrzb29a: character special (8/54336) SCSI #3 HSZ40 disk #233 (SCSI ID #5) offline
/dev/rrzc21a: character special (8/38016) SCSI #2 HSZ40 disk #170 (SCSI ID #5)
/dev/rrzc29a: character special (8/54400) SCSI #3 HSZ40 disk #234 (SCSI ID #5)
/dev/rrzd21a: character special (8/38080) SCSI #2 HSZ40 disk #171 (SCSI ID #5)
/dev/rrzd29a: character special (8/54464) SCSI #3 HSZ40 disk #235 (SCSI ID #5)
/dev/rrze21a: character special (8/38144) SCSI #2 HSZ40 disk #172 (SCSI ID #5) offline
/dev/rrze29a: character special (8/54528) SCSI #3 HSZ40 disk #236 (SCSI ID #5)
/dev/rrzf21a: character special (8/38208) SCSI #2 HSZ40 disk #173 (SCSI ID #5) offline
/dev/rrzf29a: character special (8/54592) SCSI #3 HSZ40 disk #237 (SCSI ID #5) offline
/dev/rrzg21a: character special (8/38272) SCSI #2 HSZ40 disk #174 (SCSI ID #5) offline
/dev/rrzg29a: character special (8/54656) SCSI #3 HSZ40 disk #238 (SCSI ID #5) offline

Anything that shows up as "offline" is being controlled by the other system.
Starting today, I've been seeing some output that looks like:

/dev/rrzb21a: character special (8/37952) SCSI #2 HSZ40 disk #169 (SCSI ID #5) errors = 0/10 offline

Both systems have given clean disk status today and both systems at other times
have reported "errors=n/mm" messages like the above line shows for some of the
disks. When I start seeing things like "CAM_UNEXP_BUSFREE", "CAM_SIM_QFRZDIS",
& "SCSI_STAT_RESERVATION_CONFLICT", I don't get a warm fuzzy feeling inside.

Does anyone have an idea why I'm seeing errors now and know where my "extra"
rz21 drive comes from? TIA Jeff Beck
Received on Thu Aug 14 1997 - 20:19:16 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT