SUMMARY:CAA-problem in TruCluster 5.1

From: Danny Petterson <danny.petterson_at_dmsn.dk>
Date: Thu, 24 Jan 2002 11:34:47 +0100

Hi Guru's

Sorry for the late summery, but here we are. Thanx to responses from (I hope
none is forgotten):
Jason Orendorf
Larry Clegg
Jan Mark Holzer
Colin Bull
Phil Baldwin

First the original question:
> Hi Gurus!
>
> Got a weird problem with caa, on trucluster 5.1, pk4
>
> Caa_stat, caa_stop, _relocate etc. works fine, and caa-applications
> failover
> if I use init 0, or manually kills a process in the application (oracle)
> on
> a node.
>
> ...but
>
>
> If I try to "crash" a node (pull the plug so to speak), caad on the
> remaining node never sees whats happening. It just continues to think the
> application is online on the old node. When the old node gets up againd,
> it
> just starts the application as if nothing happend.
>
> Its not only a problem with my own applications, but also with the
> cluster_lockd.
>
> If I kill caad on the remaining node, and restarts it, everthing is
> (somehow) ok...all apps are now offline as they are supposed to be...
>
> MC is working fine, /.rhosts is alright (and works for both nodes thrugh
> NIC
> and MC), everything appears to work just fine, cfgmgr is NOT commented out
> of /etc/inetd.conf.
>
> I think I've tried everything, so PLEASE help me, I promise to make a
> SUMMARY as soon as the problem is solved.


The problem is Patchkit 4, and its known by Compaq:

BACKGROUND:
   After installing the TCR510-027 patch (number 66.00) from
   the Tru64 UNIX T64V51B18AS0004-20011114.tar patch kit in a
   TruCluster Server environment, CAA is unable to update the
   state of devices.

PROBLEM:
   After installing the TCR510-027 patch (number 66.00) from
   the Tru64 UNIX T64V51B18AS0004-20011114.tar patch kit in a
   TruCluster Server environment, CAA is unable to update the
   state of devices. When a power-off of a member is performed,
   Applications Resources are not relocated to other members.

WORKAROUND:
   The following workarounds are available
   1. Remove TCR510-027 patch (number 66.00) by
       executing the dupatch command and choosing
       "Patch Deletion" from the main menu.
   2. Shutdown or reboot the system.
   3. Relocate the resources by using the "caa_relocate"
       command.

RESOLUTION:
   The problem is understood and a solution has been worked and
   is being tested.

I did get another problem though the patckkit was not installed reversable.
But...no problem, I should just extract the following files from the
original CD (I didnt know the packets just was tar/gzip-files:

/usr/sbin/caad
/var/cluster/caa/monitors/network.so, changer.so and tape.so

After a reboot everthing works like a charm. Lots of greetings to everyone
for their help, especailly Jason Orendorf.


<Med venlig hilsen>

Danny Petterson Dimension A/S
System Konsulent danny.petterson_at_dmsn.dk
Mobil: +45 29486827 Hovednummer: +45 87933600

</Med venlig hilsen>
Received on Thu Jan 24 2002 - 10:44:29 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:43 NZDT