Dear Admins,
I have not received any suggestions yet on the following problem.
If you have any suggestions, could you please tell us?
The cluster configuration:
(It is connected to two networks, {C0,A0,B0} and {C1,A1,B1}.)
Machine Name Ethernet Fddi Memory channel
(ip-address) (ip-address) (ip-address)
Cluster alias C C,C0 C1
Member 1 A A0 A1 Am
Member 2 B B0 B1 Bm
Some further information:
11) Although I wrote that X11 is working, I made a mistake in testing
it, and X11 from machine B to addresses Bx hangs as rsh etc. It
seems that the services that have out_alias attribute in
/etc/clua_services have problems.
12) Machine A and B have somewhat different configurations. Their
primary and secondary network interfaces are
member primary secondary
A ee0 (A0) fta0 (A1)
B fta0 (B1) tu0 (B0)
Here, "primary" means that it is listed first in "netstat -i".
13) gated log (/var/tmp/gated.log) on "B" contains wrong entry
ADD "C" 255.255.255.255 gw "C1" Kernel pref 254/0 metric 0/0 "fta0" <NoAdvise Ext Active Gateway>
which should have been
ADD "C" 255.255.255.255 gw "C" Kernel pref 254/0 metric 0/0 "tu0" <NoAdvise Ext Active Gateway>
and the corresponding "netstat -r" outputs on both "A" and "B" are
"C" localhost UH 55 94645282 lo0
???
Compaq support will investigate it more on this Thursday. But if
you have any suggestions, could you please tell us?
>>> On Sat, 12 May 2001 10:43:37 JST, Kazuro FURUKAWA <kazuro.furukawa_at_kek.jp> wrote;
> Dear Admins,
>
> We're experiencing some odd behavior with V5.1 TruCluster after the
> patch T64V51AS0003-20010413 was applied by Compaq support (even after
> another reboot).
>
> Here is the cluster configuration.
> node name ether fddi memchan
> alias C C,C0 C1
> 1 A A0 A1 Am
> 2 B B0 B1 Bm
>
> First, we noticed that rsh from B to B hangs, while ftp from B to B
> works. Then we found these interesting symptoms while we are waiting
> for a timing to reboot.
>
> 0) almost all network activities are working well. for example
> any access from A to Ax works. (Ax means A0, A1, Am or localhost)
> 1) rsh, rlogin, telnet from B to Bx hangs. (Ctrl-C can kill it.)
> (Here Bx means B0, B1, Bm or localhost)
> 1') ping from B to Bx works.
> 2) ping from B to C0 or C gets no reply.
> 2') ping from B to C1 works.
> 3) traceroute from B to C shows 30 lines of gateways of "localhost".
> packets are looping?
> 4) ftp, rup, X11, smtp from B to Bx works.
> 5) rsh, telnet, ftp from B to C0 always goes to A.
> 5') rsh, telnet, ftp from A to C is following round-robin rule.
> 6) rsh, telnet, ftp from B to C1 works 3 times, then hangs 3 times.
> 7) rup, rusers from other machines to C always goes to B.
> 7') rsh from other machines to C is following round-robin rule.
> 8) rup, rusers from B to C hangs.
> 8') rup, rusers from B to Bx works.
> 9) rup, rusers from B to C1 always goes to B.
> 10) netstat -i, -r on A and B do not show any noticeable differences.
>
> I first suspected out_alias attribute in /etc/clua_services. But X11
> works well with out_alias. (I didn't change clua_services yet.)
>
> We tried these
> /sbin/init.d/gateway stop
> /sbin/init.d/gateway start
> cluamgr -r start
> cfsmgr
> kill -HUP {inetd}
> /usr/sbin/sysman net_wizard
> finally rebooting B
>
> Those did not cure the symptoms.
> Compaq support does not provide further suggestions yet.
>
> Could someone help us?
>
> Regards.
-----
Kazuro FURUKAWA
Linac, High Energy Accelerator Research Organization (KEK), Japan
Received on Mon May 21 2001 - 09:02:50 NZST