[Summary] Using scu to scan devices from SUSROD_at_HBSI.COM on 1997-10-28 (tru64-unix-managers)

From: <SUSROD_at_HBSI.COM>
Date: Mon, 27 Oct 1997 15:23:20 -0800

Original Question:

************************************************************************
***************

Managers,

I am using the scu utility to scan my SCSI disks on my alpha 2100. The
following command works well for a standard SCSI disk:

#scu -f /dev/rrz1g
scu> verify media

But, I also want to scan my Raid (level 0) devices:

/dev/re8c 12192746 7397347 3576124 67%
/orafarm/ora01
/dev/re9c 12192746 9507131 1466340 87%
/orafarm/ora02
/dev/re10c 12192746 7544307 3429164 69%
/orafarm/ora03
/dev/re11c 12192746 10633472 339999 97%
/orafarm/ora04

When I try I get:

# scu -f /dev/rre8c
scu: Device '/dev/rre8c' is not attached to a SCSI bus.

My SCSI controller in this case is a Mylex KZPSC-XB.

Can I use scu to scan these drives? Are there other methods available
to do this type of diagnostics from the command line?

TIA

Susan Rodriguez
************************************************************************
*******************

Solution:

************************************************************************
*******************

Following suggestions (at end), I used

dd if=/dev/vol/rootdg/group11 of=/dev/null bs=8192

to read blocks from my problem device. There were no errors reading
from the device.

I also found and installed the online SWXCR manager utility and ran a
parity check on all raid 5 devices (couldn't do the raid 0). This also
returned no errors.

It was very useful to have these two cross-methods of checking a given
device. An Oracle database corruption problem was pointing to a UNIX
device error if one were to believe the oracle logs. However, using the
above two tests, and as there were absolutely no errors in any of the
system logs, we were able to rule out hardware errors. It looks like
the corruption problem is within the Oracle instance itself.

Susan
************************************************************************
*******************

Thanks to

alan_at_nabeth.cxo.dec.com who wrote:

The Mylex controller (and any controller using the "re" drive)
isn't a SCSI controller. While it has SCSI busses out the back
for device connections, it plugs directly into the PCI backplane
and uses a separate driver. Scu(8) only supports SCSI connected
stuff.

The simplest way to scan a device is using dd(1), reading from
the raw device and writing to /dev/null. Large transfer sizes
are usually wise. If there are any uncorrectable errors you'll
have to fix those using the controller utilities. For correctable
errors, the controller should handle it when it can.

and to Dr. Tom Blinn who wrote:

Susan,

The "scu" utility is so named because it is the "SCSI command utility";
all
it does is accept commands that are the equivalent of SCSI commands and
then
issue them to the device, through a normal SCSI controller. This works
for
the rz and tz (aka rmt) devices, and doesn't work in general for devices
that are not connected through a normal SCSI controller.

The "re" devices are connected through backplane RAID controllers that
are
NOT normal SCSI controllers, even though they physically interface to
SCSI
disks.

If you want to use "scu" to do a "verify media" on the devices, you need
to
physically disconnect the disks from the RAID controller, connect them
to a
regular SCSI controller, and then use scu to run the verify there.
However,
doing this is messy and likely to really mess up your RAID
configuration, as
the backplane controller will notice the disks are missing if you power
them
off or unplug them while it is powered up; so you'd need to not only
remove
the disks from the RAID controller, you'd need to remove the RAID
controller
from the system while you did this, and you'd need to be very careful
about
getting all the devices connected back exactly as they were before when
you
put the RAID controller back on-line.

I believe there is a host-based utility you can use with the RAID
controller
to do some maintenance operations on-line. However, I don't personally
have
one of the Mylex KZPSC-XB controllers in any of the systems I personally
use
so I don't have experience with it. I don't know whether it has a way
to do
the equivalent of telling a drive to run a "verify media" test. It is
not
clear to me that it makes sense to do that kind of low-level testing as
part
of any normal production operations. If I were concerned about testing
in a
RAID based environment, I'd probably want to use an on-line exerciser
such
as the diskx utility, running extended read-write tests on the RAID set,
to
make sure that the complete data path was working correctly. But that's
just my approach, and you may have perfectly valid reasons to want to be
able to do a SCSI "verify media" on the drives from the command line;
I'm
just not at all sure there's a way to do it.

I believe the correct way of monitoring the media in your RAID
environment
is to keep an eye on the error logs. I believe the "re" driver reports
any
problems that are reported by the drives to the controller. I will pass
a
copy of your message, with my reply, along to the engineer who maintains
the
"re" driver, who knows more about this stuff than I do, and perhaps she
will
comment further, either directly back to you or through me.

Tom
Received on Tue Oct 28 1997 - 00:42:58 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT