I managed to unconfuse CAA by restarting the caad on the good
node (/sbin/init.d/clu_caa stop ; /sbin/init.d/clu_caa start).
Then starting it on the bad node.
Seems that the good node was still waiting for a response from
the crashed node (caad had a tcp connection open and was waiting
on it). I believe this is what was preventing caad from starting
on the node that had crashed. Out of curiosity, is caad only
capable of doing one thing at a time such that if it gets hung
telling one node to do something, it can't take requests from
other systems etc?
-charlie
-----Original Message-----
From: cballowe_at_usg.com [mailto:cballowe_at_usg.com] 
To: tru64-unix-managers_at_ornl.gov
Subject: HELP - Confused CAA ?!?
Man - friday the thirteenth didn't go over well. And saturday 
wasn't much better. 
One of my systems in a 2 GS80 cluster crashed just as a 
caa_stop of the oracle instance was issued on one of the other 
cluster members. This left caa claiming STATUS: ONLINE 
TARGET: OFFLINE. caa_stop -f of the service tells me 
"Resource or relatives are currently involved with another operation" 
and a caa_stat run on the member that crashed gives me 
"Cannot communicate with the CAA daemon." caad -1 doesn't 
fix that like a manual says it should. 
Is there any way to get CAA to believe the service is down? 
What can I do to get caad responding on the other cluster 
member? 
-charlie 
Charles Ballowe                                /"\
Unix System Administrator                      \ /     ASCII Ribbon Campaign
cballowe_at_usg.com                                X      Against HTML Mail
x3896                                          / \
Received on Mon Sep 16 2002 - 16:31:12 NZST