Dear Admins,
We're experiencing some odd behavior with V5.1 TruCluster after the
patch T64V51AS0003-20010413 was applied by Compaq support (even after
another reboot).
Here is the cluster configuration.
node name ether fddi memchan
alias C C,C0 C1
1 A A0 A1 Am
2 B B0 B1 Bm
First, we noticed that rsh from B to B hangs, while ftp from B to B
works. Then we found these interesting symptoms while we are waiting
for a timing to reboot.
0) almost all network activities are working well. for example
any access from A to Ax works. (Ax means A0, A1, Am or localhost)
1) rsh, rlogin, telnet from B to Bx hangs. (Ctrl-C can kill it.)
(Here Bx means B0, B1, Bm or localhost)
1') ping from B to Bx works.
2) ping from B to C0 or C gets no reply.
2') ping from B to C1 works.
3) traceroute from B to C shows 30 lines of gateways of "localhost".
packets are looping?
4) ftp, rup, X11, smtp from B to Bx works.
5) rsh, telnet, ftp from B to C0 always goes to A.
5') rsh, telnet, ftp from A to C is following round-robin rule.
6) rsh, telnet, ftp from B to C1 works 3 times, then hangs 3 times.
7) rup, rusers from other machines to C always goes to B.
7') rsh from other machines to C is following round-robin rule.
8) rup, rusers from B to C hangs.
8') rup, rusers from B to Bx works.
9) rup, rusers from B to C1 always goes to B.
10) netstat -i, -r on A and B do not show any noticeable differences.
I first suspected out_alias attribute in /etc/clua_services. But X11
works well with out_alias. (I didn't change clua_services yet.)
We tried these
/sbin/init.d/gateway stop
/sbin/init.d/gateway start
cluamgr -r start
cfsmgr
kill -HUP {inetd}
/usr/sbin/sysman net_wizard
finally rebooting B
Those did not cure the symptoms.
Compaq support does not provide further suggestions yet.
Could someone help us?
Regards.
-----
Kazuro FURUKAWA
Linac, High Energy Accelerator Research Organization (KEK), Japan
Received on Sat May 12 2001 - 01:44:51 NZST