Cluster alias problems.

From: Jim Fitzmaurice <jpfitz_at_fnal.gov>
Date: Mon, 21 May 2001 14:08:55 -0500

Hello managers,

    We have an interesting problem. Before I start let me tell you about my
cluster. We have a 3 member TruCluster V5.1 and they run Tru64 V5.1 (of
course) Patch Kit 3. (I didn't have any manual patches, PK3 rolled in O.K.
on my 2nd attempt.) Right now, member3 is down with a hardware problem, but
that shouldn't have any bearing on this problem.

    Here's whats happening. We needed to reboot memeber1 to get rid of some
"immortal" processes that we couldn't kill any other way. When it came back
up the NIS daemons couldn't bind to the cluster. The cluster is the NIS
Master. Investigation showed that member1 couldn't ping the cluster alias,
and traceroute showed 30 hops from member1 to the cluster alias. Try to run
"ps -ef" and it would stop and hang after listing about 40 processes and the
"ypwhich" command would hang as well.

    All this time member2 was happily running along not having any problems
at all. It could ping the cluster alias, traceroute showed one hop and NIS
was bound. The command "ps -ef" worked normally and "ypwhich" returned the
cluster alias.

    The problem on member1 was causing problems, especially for NIS clients,
yp_all was timing out. We were trying to stop and restart various daemons to
correct the problem when we discovered stopping "aliasd" by running
"/sbin/init.s/clu_alias stop" fixed the problem. The yp_all error messages
stopped the "cannot ypbind to cluster" messages stopped, we could ping the
cluster, and traceroute showed one hop to the cluster alias. The command
ps -ef was working again and "ypwhich" returned the cluster alias.

    If we restart "aliasd" with "/sbin/init.s/clu_alias start" the problem
comes back, with exactly the same symptoms. I thought "aliasd" was required
for proper cluster operation. Yet everything seems to be working fine now
without it. How? Why? And why does starting "aliasd" break member1?

Anybody seen this?
Anybody have any ideas?

Jim Fitzmaurice
jpfitz_at_fnal.gov

UNIX is very user friendly, It's just very particular about who it makes
friends with.
Received on Mon May 21 2001 - 19:41:15 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT