Summary: Monitoring HSZ's from anthony.miller_at_vf.vodafone.co.uk on 1998-11-09 (tru64-unix-managers)

From: <anthony.miller_at_vf.vodafone.co.uk>
Date: Sun, 08 Nov 1998 11:33:03 +0000

On the 2/11/98 I posted the following:

/Hi all...
/
/We have recently had an experience where a member of a raid-5 set
/failed. The raid set was connected to an HSZ70 - the raid set
continued
/as expected and no data was lost (advfs file systems on two LSM
/volumes).
/
/This made me think of writing some kind of cron script to use hszterm
to
/check these controllers on a regular basis for these errors and others
/(e.g. cache battery low etc). Rather than reinvent the wheel, has
/anybody else done this? Is there a better way?
/
/(volwatch did not report anything as the lsm volume continued to
operate
/with no problems).

As usual, some excellent and prompt replies were received from:

knut.hellebo_at_nho.hydro.com
bruce.hines_at_mci.com
robert.otterson_at_digital.com
bbody_at_acxiom.co.uk
ajohnson_at_mail.nbme.org
snkac_at_java.sois.alaska.edu - Kurt Carlson University of Alaska
alan_at_nabeth.cxo.dec.com
webster_at_ssdpdc.lgb.cal.boeing.com - Tom Webster
ulrich.gauber_at_kuka.de
willig.reimund.gdr.de

Very many thanks to everybody who responded. As usual you hit the
nail on the head.

+-----------------------------------------------------------------+
| TONY MILLER - Systems Projects - VODAFONE LTD, Derby House, |
| Newbury Business Park, Newbury, Berkshire. |
+-------------+---------------------------------------------------+
| Phone | 01635-507687(local) |
| Work email | ANTHONY.MILLER_at_VF.VODAFONE.CO.UK |
| Home email | SANDRA_TONY_MILLER_at_COMPUSERVE.COM |
| X.400 | G=ANTHONY; S=MILLER; C=GB; A=GOLD 400; P=VODAFONE |
| FAX | 01635-506709(local) |
+-------------+---------------------------------------------------+

Disclaimer: Opinions expressed in this mail are my own and do not
reflect the company view unless explicitly stated. The information
is provided on an 'as is' basis and no responsibility is accepted for
any system damage howsoever caused.

Most referred me to Storage Works Command Console. Listed below is a
typical
response
------------------------------------------------------------------------
-----
If you install SWCC v2.0 (StorageWorks Command Console) on a NT box and
the
agent stuff on the DEC Unix system you can monitor everything. You can
even
get paged ...
Have a search at www.storage.digital.com.

Install "AGENT" on unix
Install SWCC SW on a PC..( the agent alone may be OK for mail )

One response mentioned sys_check - alan_at_nabeth.cxo.dec.com
------------------------------------------------------------------------

----
It might be sufficient to merely monitor the error log
 for such events.  The HSZ will probably answer some later
 command with a Check Sense, and the request sense will
 have the information about what happened.  The sys_check
 HTML generator runs HSZterm, so you want to look at
 getting it.  I think the MCS website has a kit.
An extract from Tom Webster
------------------------------------------------------------------------
--
You can write scripts that use HSZterm (now a discontinued product)
   to check the status of the HSZ.  You can find examples of scripts 
   by either serching the archives for the message: "SUMMARY:hsz40 
   question" posted by Ronny Eliahu <ronny_eliahu_at_corp.disney.com>
   back on 25 Mar 97.  You could also download the University of
   Alaska's DU tools (ftp://raven.alaska.edu/pub/sois/UA_DUtools.tar.Z)
   which includes an example.
Kurt Carlson gave an excellent response which I have included in full
below
------------------------------------------------------------------------
---
There is a script which monitors via hszterm in:
 ftp://raven.alaska.edu/pub/sois/README.UA_DUtools 
  kit: ftp://raven.alaska.edu/pub/sois/UA_DUtools-v1.9.tar.Z
Should be in the du/job directory there.
I had this running nightly.
It was primarily intended to monitor configuration changes
and email interested parties, it did also monitor the hsz error log.
Secondarily I had another script reducing and summarizing the unix
errorlogs nightly.  It's purpose was to allow easy trend analysis
for errors... specifically for disks noting cdisk_bbr trends
on a particular disk for preventative replacement.  That
script is likely in the examples directory of:
 ftp://raven.alaska.edu/pub/sois/README.uaio 
  kit: ftp://raven.alaska.edu/pub/sois/uaio-v2.1.tar.Z
which is also included in the UA_DUtools-v1.9 kit.
The uaio kit has got a replacement for iostat which is effectively
broken for reporting disk activtity on hsz's. 
 
For "live" monitoring of the hsz's (we had 6 pairs on 3 systems)
we used Console Manager connected to at least one of the pair's ports.
Typically one of the system admins had monitoring icons on their
desktop... the icons fatten when something updates so they'd
have a visual clue to check.  Also, console manager logs
everything to a host file so we could look back at what otherwise was
lost.  I believe Digital's solution to this was the NT workstation
intended to monitor storage works consoles (called something like
Storageworks Command Console)... we werent' interested
in that (no remote access, yet another box to manage).
Other than the disk logging of the console activity, the live 
monitoring really wasn't used much with the summaries coming 
nightly from the other two tools.  The retained log from the console
is fairly important.  At least in the hsz40's & 50's the hsz
log only kept major events (and only last 4) and only host 
noticable events made the host uerf|dia logs, some "minor"
events could otherwise scroll off.... those minor events are
sometimes the clues of something less minor occuring later.
However, Console Manager was sold to CA and became somewhat
unsupported.  We were serving it off a 3000 used for admin
functions, if necessarily they'll leave the 3000 on 4.0b
indefintely to continue running Console Manager.
The was's & had's (past tense) above are due to my switching to another 
branch of the University... I'm now supporting our Crays & SGI's.  The
tools are still running in the other branch and I've been passively
maintaining the kits for them with other kits I maintain. 
Ulrich Gaube Supplied the following
---------------------------------------------------------------
we are doing this kind of observation every night with cron and
the following (very simple) script:
#!/bin/sh
#
# Zustandsanzeige HSZ40
#
hszdevi="/dev/rrze24a"      #replace the devicename according to your
needs
outfile="/usr/tmp/HSZstat_`date '+%y%m%d%M%H%S'`$$"
/usr/bin/hszterm -f $hszdevi "show this" >$outfile
/usr/bin/hszterm -f $hszdevi "show other" >>$outfile
/usr/bin/hszterm -f $hszdevi "show device full" >>$outfile
cat $outfile | mailx -s "HSZstat `hostname` `date`" <insert some
mail-address>
rm -f $outfile
Just replace the value fore hszdevi with any Raid-Device delivered by
your HSZ70.  Also replace the mail-adress for the mailx.
Reimund Willig wrote
------------------------------------------
we just wrote a little perl script, which is appended at the end. If
everything is ok, the output of the script is just nothing! So its up to
you to decide how to manage what will happen with the results! We are
using bmc-patrol to automatically send an alarm!
#!/usr/bin/perl
_at_controller=(glob("/dev/hsz[a-z0-9]*"));
foreach $_ (_at_controller) {
        open ( DAT,"hszterm -f $_ 'show this'|" ) || warn "ERROR:
$!\!\n";
        while ( $line=<DAT> ) {
  if ( $line =~ /failed|disabled|bad power supply or fan/i ) { print "$_
 meldet: ==> $line";} 
  if ( $line =~ /cache is|battery is/i && $line !~ /good/i ) {print "$_
meldet: ==> $line" ;}
        }
        close(DAT);

Received on Mon Nov 09 1998 - 12:13:12 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT