I managed to unconfuse CAA by restarting the caad on the good
node (/sbin/init.d/clu_caa stop ; /sbin/init.d/clu_caa start).
Then starting it on the bad node.
Seems that the good node was still waiting for a response from
the crashed node (caad had a tcp connection open and was waiting
on it). I believe this is what was preventing caad from starting
on the node that had crashed. Out of curiosity, is caad only
capable of doing one thing at a time such that if it gets hung
telling one node to do something, it can't take requests from
other systems etc?
-charlie
-----Original Message-----
From: cballowe_at_usg.com [mailto:cballowe_at_usg.com]
To: tru64-unix-managers_at_ornl.gov
Subject: HELP - Confused CAA ?!?
Man - friday the thirteenth didn't go over well. And saturday
wasn't much better.
One of my systems in a 2 GS80 cluster crashed just as a
caa_stop of the oracle instance was issued on one of the other
cluster members. This left caa claiming STATUS: ONLINE
TARGET: OFFLINE. caa_stop -f of the service tells me
"Resource or relatives are currently involved with another operation"
and a caa_stat run on the member that crashed gives me
"Cannot communicate with the CAA daemon." caad -1 doesn't
fix that like a manual says it should.
Is there any way to get CAA to believe the service is down?
What can I do to get caad responding on the other cluster
member?
-charlie
Charles Ballowe /"\
Unix System Administrator \ / ASCII Ribbon Campaign
cballowe_at_usg.com X Against HTML Mail
x3896 / \
Received on Mon Sep 16 2002 - 16:31:12 NZST