Strange cluster alias behaviour

From: Tim Cutts <tjrc_at_sanger.ac.uk>
Date: Tue, 22 Feb 2005 17:01:07 +0000

LSF started misbehaving on one of our clusters; in this case a 6-node
ES45 cluster, Tru64 5.1B PK 2. I discovered that LSF daemons could be
contacted from outside the cluster, but not from any machine inside the
cluster.

Examining /etc/clua_services, I discovered that it was missing the
lines telling the cluster alias about the ports, so I added the lines:

#
# LSF Ports
#
lim 3879/tcp in_noalias,static
res 3878/tcp in_noalias,static
mbatchd 3881/tcp in_noalias,static
sbatchd 3882/tcp in_noalias,static
mbdquery 40001/tcp in_noalias,static

ran 'cluamgr -f' on all nodes, and restarted LSF on all 6 nodes for
good measure.

But the strange behaviour still continues. If I try to connect to one
of these ports from outside the cluster it works:

16:52:51 tjrc_at_ecs4d:~$ telnet ecs2d 3882
Trying 172.17.1.204...
Connected to ecs2d.

But if I try to connect from within the cluster, the operation times
out:

16:53:30 tjrc_at_ecs2c:~$ telnet ecs2d 3882
Trying 172.17.1.204...
telnet: Unable to connect to remote host: Connection timed out

Any ideas, short of rebooting the cluster, which I am reluctant to do?

Many thanks,

Tim

-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233
Received on Tue Feb 22 2005 - 17:05:40 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT