I was gently reminded that I was not specific enough in my summary,
so here is some text from my respondents ( and I thank them once again!)
____________________________________________________________________________
_
I am Tru64 4.0e on my system. I have done a simple script to check the sizes
of logs and also to do a space available. It is simple and won't do the
things
that you want. You would have to customize the script for your exact needs.
A couple of weeks ago I was having problems with one file system filling up
a
log file (7.7 GB) so I just created this. It will only check log files and
then email me the results. This is then ran in a crontab to send me the
results periodically.
____________________________________________________________________________
_
Track down the subset that has "sys_check" in it. The most recent
version may also be available from the Services web site.
____________________________________________________________________________
_
What's wrong with a simple script? I'm including one that our Ops uses just
to
check on the basic health of the system. You can grow/modify it to meet
your
needs. Hope this helps...
Larry
(pardon the line wrapping below...the good ole cc:mail is doing that for
me.)
#!/usr/bin/ksh
#
# Name: health_check
#
# Author: Larry E. Clegg
# Date: 25-March-1998
#
# Purpose: Issue a series of commands for the Operators to determine
# the system health check status. The specific commands will
# vary by system.
#
# First clear the screen and introduce yourself.
clear
echo "HealthCheck for AlphaServer 4100: Webserver on `date`"
echo "Notify System Management if problems are observed."
echo " "
echo "Checking production web server processes. Count must be greater than
zero.
"
sleep 5
ps -ef | grep httpd-80 | wc -l | xargs printf "\nThere are %d www.lpl.com
proces
ses.\n"
ps -ef | grep httpd-61000 | wc -l | xargs printf "\nThere are %d tech
support we
b processes. \n"
echo " "
read RESPONSE?"Press RETURN to continue: "
clear
echo "HealthCheck for AlphaServer 4100: Webserver on `date`"
echo " "
echo "Checking Filesystems - check for 98% or more and notify System
Manager"
echo "Please ignore the / and /proc partitions."
echo " "
df -k
echo " "
read RESPONSE?"Press RETURN to continue: "
clear
echo "HealthCheck for AlphaServer 4100: Webserver on `date`"
echo " "
echo "Checking Swapspace - check for 20% or less and notify System Manager"
echo " "
/sbin/swapon -s | grep -i available
echo " "
read RESPONSE?"Press RETURN to continue: "
echo " "
clear
echo "HealthCheck for AlphaServer 4100: Webserver on `date`"
echo " "
echo "Checking for users who have not logged off appropriately."
echo "Please report any usernames which show a logon date other than today."
echo " | |"
echo "+-----------------+ +----------------+"
echo "| |"
echo "v v"
who
echo " "
echo "HealthCheck finished - please logoff now."
echo "HealthCheck finished - please logoff now."
echo "HealthCheck finished - please logoff now."
echo "HealthCheck finished - please logoff now."
echo "HealthCheck finished - please logoff now."
exit
____________________________________________________________________________
_
Not Official support:
Two Suggestions:
1- Big Brother is a web based system checker that gives people
RED/GREEN/YELLOW status lights . You'd have to Install this....
here's what it looks like
http://speedy.coconet.com/bb/www/bb.html
here is where to get it (free for non-commercial use)
http://bb4.com/
2- If your running 4.0F or 5.0 Tru64
Look at Compaq Insight Manager.
It runs on port 2301. Look at the system via a browser
http://node.site.your.com:2301
click on the BOX "COMPAQ INSIGHT MANAGER AGENTS"
____________________________________________________________________________
_
The performance manager (pmgr) comes with Tru64. It takes
some time to configure, but can be set with thresholds to
notify if disks get full, systems start swapping, etc.
It does require a graphics tube to run on, however.
I make do (for now) with several relatively simple scripts to
check disk space, memory use, databases being up, and the
systems ping each other to ensure network and systems are up.
I've chosen Quest Software's I/Watch product as the enterprise
monitoring tool for the next generation of systems that I'll
be installing from now till June.
____________________________________________________________________________
_
When I was just starting out, I read a posting from one person on the list,
referring to his beloved Big Brother machine. Sounded kind of weird but it
peak my curiosity so I checked into it. Now after a year of using it I am a
believer, I love it as well. BigBrother is a Open Source Software, System
Monitoring Tool. It can monitor anywhere from one system to hundreds of
systems, (It's very scaleable.) and it runs on almost any platform. It
displays a nice html page with all the information, which can be viewed with
any browser. It monitors things like file system sizes, processes running,
CPU load, connectivity, (ping, telnet, ftp...) etc. Further more since it's
OSS people write additional modifications and extensions. They are available
for monitoring such things like swap space, databases (Oracle, Sybase,
ingress, Informix...) watching for hacker activity, AdvFS Domain
information, Alpha Systems Internal Temperature, and more! It is fully
configurable, you set warning (Yellow) and panic (Red) thresholds, and in
addition to the html page, you can have it send you warnings via e-mail or
by sending a page to your pager. CPU load is minimal. If your worried about
support, don't. It is supported via a mail list, where not only do
experienced users offer insight, the main BigBrother developers participate
and answer users questions as well. As Sean McGuire the guy who started
BigBrother said it best when he said, "It's just when you care enough about
something where you want to make it work."
I currently run it on my Tru64 machines and my LINUX boxes. And is
have
written a number of extensions for Tru64,(Hacker monitor, AdvFS domain
monitor, Internal Temp monitor, SWXCR monitor.) and ported the hacker
monitor to LINUX as well.
Most of it is written in shell script and it has an excellent
installation
program, and is not intrusive, it pretty much stays where you put it and
runs from there. It as one small C-program so a compiler is useful, but
there is a web site where you can pick up pre-compiled versions, if you
don't have one (and Tru64 is out there). I can't remember the URL but if you
as on the mailing list you'll probably get a dozen answers within an hour,
or I could send you a pre-compiled bbd if you'd like.
You also need a web server if you don't have one but I downloaded
and
installed Apache on my first try on Tru64, (and LINUX, it's my backup
server, BB can be configured for fault tolerance too.) no prior experience.
Since then I've learned how to fine tune my configuration and write HTML (in
vi) to create my own web pages and modify BB's web pages.(All in the last
year.)
(Kermit and a modem is also required is you need to send numeric
pages, but
not for general system monitoring and e-mail.)
Here's where you can get all the stuff:
BigBrother:
http://bb4.com
Extensions:
http://www.deadcat.net
Apache:
http://www.apache.org
Kermit:
http://www.columbia.edu/kermit/
HTML Training:
http://www.wdvl.com (and other neat stuff.)
Best of all it's all OSS, (That's what you say to the boss if he/she asks,
whet that means to you is FREE!) but it looks like you paid a million bucks.
You can see mine at:
Tru64, main server:
http://d0hs19.fnal.gov/bb
Linux, 2nd server:
http://d0ol01.fnal.gov.bb
(don't worry about d0nt90 it's a test system where I'm trying
larrd.)
Good luck, give it a try. It doesn't take long to set up, but in the mean
time use:
df -k : to monitor file system space.
ps -ef : to monitor processes.
uptime : to monitor CPU load.
who -M : to monitor users.
w : to monitor users idle time.
swapon -s : to monitor swap space.
vmstat -P : to monitor memory usage.
vmstat 5 5 : to monitor memory and CPU usage at 5 second
intervals.
iostat 5 5 : to monitor I/O and CPU usage at 5 second
intervals.
sysconfig -q envmon : to monitor system temperature.
Check the man pages for more info but this is a good start.
____________________________________________________________________________
_
Here are a few Tru64 links that I have found helpful:
System Administration:
http://www.tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V40
D_HTML/APS2RETE/TITLE.HTM
System Configuration and Tuning:
http://www.unix.digital.com/faqs/publications/base_doc/DOCUMENTATION/V40D_HT
ML/AQ0R3FTE/TITLE.HTM
AdvFS Administration:
http://www.unix.digital.com/faqs/publications/base_doc/DOCUMENTATION/V40D_HT
ML/AQ0R3FTE/TITLE.HTM
COMPAQ [DIGITAL] Search:
http://search.digital.com/
Search the Tru64-UNIX-Managers Mailing List:
http://www-archive.ornl.gov:8000/archive/power.htm
____________________________________________________________________________
_
> -----Original Message-----
> I am new to Tru-64 (one week experience thus far!) and wonder if anyone
has
> a basic system-checking script - something the help desk folks could use
to
> check things like file system sizes, processes running, etc. before they
> call the techs. I sure would appreciate something (similar to Alan
Davis's
> for model id check) that I could start with and customize for our needs -
> we need one quickly!
>
> sys_check is far too extensive and this needs to be read by a non-SA.
> Also, ideas on what should be minimally checked would be a great help.
____________________________________________________________________________
_
Received on Tue Mar 21 2000 - 00:45:54 NZST