Thanks,
I had lots of help as usual. The following individuals helped me.
Paul Crittenden <crittend_at_storm.simpson.edu>
JC Faul <jfaul_at_mtc.com.na>
Mike D Cross <crossmd_at_mh.uk.sbphrd.com>
Phil Baldwin <baldwinp_at_eurodis.com>
ALLAN HAWDON <allan.hawdon_at_kcl.ac.uk>
Jim Fitzmaurice <jpfitz_at_fnal.gov>
And of course our very own Database Administrators:
Olubunmi Gbile <OGbile_at_DWD.State.IN.US>
Franke Capler <FCapler_at_DWD.State.IN.US>
Carla Shulse <CShulse_at_DWD.State.IN.US>
I guess this was my problem so I took ownership of it. The sample
skeleton script for the fix actually came from
http://metalink.oracle.com/ but that site requires access codes and
these access codes are managed by the Database Administrators in our
shop. So it was Olubunmi Gbile, DBA, who actually did the research and
got Doc ID: Note:62068.1 "Automatic Startup and Shutdown on Digital
UNIX", revised 26-Aug-1999, for me. I guess we were using and older
variant of this script, but even after looking at the newer 1999 version
of the script we decided we still had to make some changes to this
script to make it ready for the savepnpc facilities of Legato. In other
words we had to fully qualify the path names to the UNIX executables in
/sbin/init.d/oracle to make sure savepnpc saw these executables
correctly because Legato knows little if anything about our $PATH
environment. It will therefore choke on unqualified commands like "su".
Legato requires these commands to be qualified like this "/usr/bin/su".
Those details can be found at the bottom of this dissertation. I saw a
lot of sample startup code from the friendly helpers at majordomo, and
it appears quite a few of you have developed your own in house solutions
for bringing up/down the Listener and Databases automatically. Back
home in Indiana we were going for a solution as close to Oracle's
recommended solution that we could get which is not to say the solutions
you shared with me were wrong. Some of you merely hard coded your path
statements while others chose not to redirect your startup output to
your log files. Its a preference thing and it was your preference. We
wanted to use the recommend Oracle variables and redirect the startup
output to our log files. So the solution that we present here is a
slightly modified version of Oracle's suggested solution which is
attached to the bottom of this summary. Be aware though, the attached
solution is not intended for those of you who are using C2 security, so
you better not apply any of this if you are. Oracle, has another
solution for those of you who use C2 security and that solution is not
described in this summary.
In our shop it is the squeaking wheel gets the most grease and grease
is a scarce commodity here. We have a lot of projects cooking
simultaneously, mainframe, NT, and UNIX, therefore I was delaying for
various reasons my rollout of patch kit-3 on my #2 server, Alpha3. The
Application Developers and the Database Administrators have their own
agendas too. Some times one person's project takes a back seat to
another person's project. The DBAs were patching recently and I think
there was a major application roll-out some time during the month of
June too. What I'm trying to say is whenever anyone changes anything
around here we like to make an observation of stability before we apply
the next change. Therefore my patch kit-003 changes were in the pipe,
considered good in unit test, Alpha1, but still waiting to be
implemented in production, Alpha3.
Meanwhile the Operations staff was having problems getting the Legato
backup group Alpha123-backup to work smoothly. They would start the
Alpha123-backup group, but then it would hang. This group protects
three (3) clients: Alpha1, Alpha2, and Alpha3; but Alpha3 was the only
member of that group not patched with the latest patch-kit, Tru64 UNIX
5.1 patch kit-003, and the Alpha3 client was causing the entire group to
hang. In other words it was hung and we didn't know hung how. It could
have been hung up in the $ORACLE_HOME/bin/dbshut portion of the script
executed by /sbin/int.d/oracle which was started up by Legato's savepnpc
facility, or it could be hung up by some piece of code that would have
been patched by the not yet applied patch kit-003. Like I said we never
knew hung how.
What we did know though was the work around Operations was using to
beat this problem was to use the Legato's nwadmin GUI to stop the
Alpha123-backup group, then they would boot Alpha3, Eeeek!, wait for it
to restart, then using the same GUI they would restart the
Alpha123-backup group. This would clear the hang and then the
Alpha123-backup group would process and finish normally.
Needless to say, all that booting of our number #2 server was causing
issues, i.e. the Oracle listener wasn't coming up automatically, this
required manual intervention which of course the operators didn't know
how to do because generally and usually speaking we try not to boot
these servers and there were no written procedures. The longer our
servers stay up the better it is we think. I guess prior to our recent
Legato upgrades from release (5) to release (6) and the recent operating
system upgrades from release 4.0e to 5.1 we rarely booted these servers
which may explain why the /sbin/init.d/oracle script never really worked
correctly. I guess the DBAs were in the habit of manually bringing up
the Listener whenever these infrequent boot might occur and this wasn't
a problem for them. However one boot a night is a problem for them and
they were beginning to feel Help Desk heat from a downed listener every
morning which they generously decided to share with me. Needless to say
we now have written procedures for Operations to follow to status or
start the Listener now.
In conclusion we are not sure why backup client Alpha3 routinely caused
to hang the entire backup group Alpha123-backup but we decided to hit
this problem with three separate fixes. At the very least we hope to
alleviate the most problematic symptom which was: Why wasn't the V2
listener automatically starting up after a re-boot anyway?
We will continue to monitor our Legato backup group Alpha123-backup for
client hangs which causes the entire backup group to not start.
*******
Fix (1)
*******
Fix number one (#1) we applied patch kit-003 to Alpha3.
*******
Fix (1)
*******
*******
Fix (2)
*******
Fix number two (#2) we changed the shutdown command in
$ORACLE_HOME/bin/dbshut from a NORMAL shutdown whith is the default to
an immediate shutdown. I'm not sure if this is a "good practice" or not
but we are giving it a try. We changed a line of code in this script
from #shutdown to #shutdown immediate. Some day we may back out of this
change. The /sbin/init.d/oracle script we are using brings down the
Listener first then it brings down the database. We are currently
experimenting with an IMMEDIATE shutdown. I'm not a DBA but it seems to
me since we have already pulled the "listening rug" out from under the
user processes there might now not be much difference between a NORMAL
and IMMEDIATE shutdown so I don't know if we are buying anything with
this change. We will watch it and monitor for problems. If you guys
think there is a problem with using IMMEDIATE please let me know. Check
this out. There was a note labeled "Additional Information" in the
Oracle metalink document and I'll quote it for you now.
"The default shutdown performed by dbshut is a normal "shutdown".
If users are still logged in, dbshut will 'hang' until all users have
logged off the database. You may want to
alter the script so that is does a shutdown immediate."
We did this. Hope it doesn't cause any problems. We may back out of
this one.
********
Fix (2)
********
********
Fix (3)
********
Fix number three (#3)
============================================== Fix #3 Step One
==================================
Ensure the /etc/oratab file is complete and correct.
Database entries in the oratab file have the following format
ORACLE_SID:ORACLE_HOME:[Y|N]
The yes/no flag determines whether you want to automatically startup or
shutdown the database. Ours are to yes.
============================================== Fix #3 Step Two
==================================
First off let me state this fact. In our environment the root user can
not start up the Oracle Listener. We have to use a special designated
Oracle userid to bring this thing up.
The Oracle metalink suggested script /sbin/int.d/oracle takes these
special userids into consideration and solves the special userid problem
for us.
Second, in our original /sbin/init.d/oracle script we found the
Listener start up and shut down lines to be commented out. Even when we
uncommented these lines we found the old version had syntax errors in
them so they probably never worked for us. Here is the corrected script
that works and has been tested with Legato's savepnpc features.
#!/sbin/sh
#
# change the value for ORACLE_HOME to be correct for your installation
#
ORACLE_HOME=$%&%#$_at_$#$#$_at_ <<=== put your own stuff here
PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/network/admin:$ORACLE_HOME/network/log
#
# change the value of ORACLE to the login name of the
# oracle owner at your site
#
ORACLE_USER=$%&%#$_at_$#$#$_at_ <<=== put your own stuff here
export ORACLE_HOME PATH
#
LOG=$ORACLE_HOME/startup.log
echo "LOG config"
/usr/bin/touch $LOG
#chmod a+r $LOG
#
echo "Test of $1 begins here"
case $1 in
'start')
echo "$O: starting up" >> $LOG
/usr/bin/date >> $LOG
# Start SQL*NET V2
if [ -f $ORACLE_HOME/bin/tnslsnr ] ;
then
echo "starting V2 listener"
/usr/bin/su - $ORACLE_USER -c "$ORACLE_HOME/bin/lsnrctl
start" >> $LOG 2>&1
fi
echo "starting Oracle databases"
/usr/bin/su - $ORACLE_USER -c "$ORACLE_HOME/bin/dbstart" >> $LOG
2>&1
;;
'stop')
echo "$O: shutting down" >> $LOG
#
/usr/bin/date >> $LOG
# Stop SQL*NET V2
if [ -f $ORACLE_HOME/bin/lsnrctl ] ;
then
echo "stopping V2 listener"
/usr/bin/su - $ORACLE_USER -c "$ORACLE_HOME/bin/lsnrctl
stop" >> $LOG 2>&1
fi
echo "stopping Oracle database"
/usr/bin/su - $ORACLE_USER -c "$ORACLE_HOME/bin/dbshut" >> $LOG
2>&1
;;
*)
echo "usage: $O {start|stop}"
exit
;;
esac
#
exit
============================================== Fix #3 Step Three
=====================================
Get the script created in step 2 to be called on the startup/shutdown.
As root pefrom the following:
#ln -s /sbin/init.d/oracle /sbin/rc3.d/S99oracle
#ln -s /sbin/init.d/oracle /sbin/rc0.d/K01oracle
============================================= Fix #3 Step Four
======================================
Test the script created in step 2. Log in as root
#/sbin/init.d/oracle start
#/sbin/init.d/oracle stop
*******
Fix (3)
*******
That's it for now. Whew, that was a long one even for me.
Sincerely
- Kevin Criss
Received on Tue Aug 07 2001 - 03:36:26 NZST