SUMMARY: (swxcr) devices on du v4.0 and uaio | iostat

From: Kurt Carlson <sxkac_at_java.sois.alaska.edu>
Date: Wed, 20 Aug 1997 12:23:31 -0900

uaio (enhanced iostat) has been updated to properly recognize
swxcr ('re') disks for both du v3.2 and v4.0.

uashodev has been updated to display bus and controller information
as well as device information.

both are available as uaio v1.3 under:

        ftp://raven.alaska.edu/pub/sois/uaio.v1.3.tar.Z
  or: ftp://raven.alaska.edu/pub/sois/UA_DUtools-v0.9.tar.Z

Thanks (!) to:
        Keith Lewis <keith_at_mukluk.cc.monash.edu.au>
        Thomas Erskine <tom_at_crc.doc.ca>
        Abdon <71055.111_at_compuserve.com>
        Andrew Greer <Andrew.Greer_at_vuw.ac.nz>
        Stephen Cooper <stephen.cooper_at_alphawest.com.au>
        Judith Reed <jreed_at_AppliedTheory.com>

Original query:

> hello, i have revised uaio to correctly report swxcr devices ('re' disks)
> under du v3.x, but i need somebody with a swxcr running du v4.0
> to send me the results of uashodev to ensure uaio correctly
> reports 're' devices under du v4.0.
>
> uaio is an enhanced substitute for iostat, current version is:
> ftp://raven.alaska.edu/pub/sois/uaio.v1.2.tar.Z
> get disk_info.c from the same directory for swxcr support under du v3.x
> (uaio and uashodev must be recompiled with the new disk_info.c).
>
> i will release v1.3 as soon as I can confirm swxcr devices are properly
> reported under du v4.0. thanks, kurt

Extracts of the discussions (which may be useful to others):

[...]
> Basically though, I think its a pretty impressive utility. I first
>found it when looking for a good tool to give me disk reads vs writes for
>input to management decisions regarding raid 0+1 vs 5 (which is producing
>more heat than light currently) and also for the I/O service time, which
>wasn't easy to get from any other utility I've come across. (Although
>volstat or some such in LSM does give figures, they are not credible). I
>might indeed be using uaio for part of our stats gathering effort.

du v4.0 does collect service time.... as soon as i saw that i
quickly put it into uaio. none of our production systems
are runing v4.0 yet, but the numbers look credible from the 3 test
systems we have running it (by comparing with documented service
times for a couple different device types).

on read vs. write, advfs does capture some statistics on this.
advfsstat (in v3.2, but not documented until v4.0) can report
on this, but only one domain at a time. randy hayman here has
prototyped a utility reporting all domains, you can find it:
  ftp://raven.alaska.edu/pub/randy/perf_mon_tools/disks.README
this is an on-line monitor, i'm not sure if he's put the data
collecting there yet in a 'final' form (as time permits).
!
!Correction advfsmon is not yet available on raven, Randy is waiting
!for clearance from Digital to distribute advfs structure
!definitions which were obtained from the source cd vs. /usr/include.
!disks.README is a different tool.

as for raid 0+1 vs. 5.... we've clearly seen the 1/4 write performance
degradation under raid 5 vs. jbod or mirroring. for heavy write
files (like oracle logs) we have adopted mirror vs. raid-5 and just eat
the extra disk costs. we're using controller based raid vs.
host (software) based... i have "religious" problems with software
based raid based on old vms experiences. we do use advfs volume
sets for data we don't need backed by raid... that does provide
the extra speed of multiple heads accessing data.

as for monitors in general, most of the canned products (like
patrol or openAviator) just put wrappers around standard unix
utilities (typically sar, sometimes iostat) which make them
less-than-useful.... the only Digital UNIX specific monitoring
i've seen is the psdc product which Digital sold off to CA...
it was a product with alot of potential, it's probably dead now.
the best DU based monitor is 'monitor' (free vs. commercial).
we have had dialogs going with Digital engineers, BMC (patrol),
Spire (openAviator), and Candle (they're moving into UNIX
monitoring.... their MVS monitors are high quality)... all are
a little behind in providing meaningful io stats, hence we've
had to write our own. they're also behind in meaningful
ubc reporting.... we're still battling that. the "modern"
solution to these monitoring problems seems to be "buy more
hardware" (if we only had unlimited $$'s).

[...]
It's curious that both rz0 and re0 are both unit 0... the xcr
devices must be totally distinct. Unfortunately, the device
sort in uaio is based on unit number right now so the order
of re0 vs. rz0 is effectively random.

[...]
As an aside, uashodev reports everything it finds by walking
the bus, controller, and device structures and fabricates the
device name while uaio uses the table(TBL_DKINFO,...) call.
Combined logic in disk_info and uaio is used to not display
non-existant disks with uaio.

[...]
> The thing here is the `b' in the name reb9...
>
> If I get time today I'll see if I can sort out where that came
>from. It *might* be somehting I fiddled with...

This isn't your fault... the new version of uaio.c should
remove the 'b'.... explanation:
For non-lun 0 disks behind an hsz controller, the convention
is b=lun 1, c=lun 2, etc. For scsi devices this comes from
the unit number modulo 8... for swxcr the unit number is just
a number (by appearances). I had to add a compare for 're'
in uaio.c to force all 're' disks to be lun 0 for display.

Another aside, all the table call returns for disk name is 'rz' or
're', not the full device name. One can name the device files anything
one wants to name them... but certain things (like iostat, monitor, etc.)
construct the names according to conventions.... no place in UNIX stores
the actual device name (other than the /dev file names).
By the way, use 'uaio -D' and you'll see display of everything
available with the table call for disks.

[...]
> BTW, the Silas machine has 2 CPU's ...

btw, hidden in the 'uaio -S' display is the number of CPU's and
the mhz of them:

sxkac_at_nugget: ./uaio -Sm 4 | uakce -a0," "
        970815.132445 Sleep:1 Iterations:0
           Total rz0 rz1 rz2 rz3 cpu
         kbs tps kbs tps kbs tps kbs tps kbs tps us ni sy wa id
          94 5 20 2 . . 6 0 4 0 16 2 9 0 73

--> Boot: 97/07/30 11:51:16 nugget 3*190 |
               97/08/15 13:24:45 V3.2 |
[...]
the '3*190' indicates nugget has 3 cpu's which are 190 mhz...
it's a 2100-4/200, but the actual mhz is 190 not 200 (that's
typical that the mhz ratings are slightly beneath the stated name).
>From the display above rzb20, rzc20 and rzd20 are non-lun 0
devices behind an hsz controller.

[...]

_____________________________________________________________________
Kurt Carlson, University of Alaska SOIS/TS, (907)474-6266
sxkac_at_alaska.edu 910 Yukon Drive #105.63, Fairbanks, AK 99775-6200
Received on Wed Aug 20 1997 - 22:44:34 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT