Thanks to Frederik (frpa01_at_shb.se) who was the only one that managed
to identify the problem. Well done. The problem was rather serious.
Here follows a description of my problem and then the solution:
I have a customer with an Alpha 8200 100MBit Card (tu2). It has been
very simply configured. It is basically just a dumb host on the
network - and it should only speak when it gets spoken to. This
system has approx. 200 users spread geographically across the
country. However one particular site every few days seems to lose
their connection to the 8200 (running Sybase). Cannot ping or
telnet. When things are working correctly the output of netstat -r
is as follows:
Routing tables
Destination Gateway Flags Refs Use Interface
Netmasks:
Inet 255.0.0.0
Inet 255.255.255.0
Route Tree for Protocol Family 2:
default gateway UG 53 1236620 tu2
localhost localhost UH 1 0 lo0
196.6.175 gateway UG 5 8011 tu2
200.1.1 dec8200 U 85 355411 tu2
Note: Neither routed or gated are running (not necessary, I think it
will just complicate matters)
A copy of /etc/routes is as follows:
default 200.1.1.254 #gateway
When that remote site looses their network connection the output of
netstat -r is as follows:
Routing tables
Destination Gateway Flags Refs Use Interface
Netmasks:
Inet 255.255.255.0
Route Tree for Protocol Family 2:
default gateway UG 67 677265 tu2
localhost localhost UH 1 0 lo0
196.6.175 firewall UGD 2 2805 tu2
# the above line is the weird entry suggesting that the traffic has
# been dynamically redirected through this firewall
COMroute gateway UGH 0 44 tu2
200.1.1 dec8200 U 101 92532 tu2
What I did after this was remove all traces and entries of the
firewall from the /etc/hosts and any other file that might have had
reference to it.
I must mention that this system is not on the internet and should not
use DNS at all. However, someone did run (before this problem
happened) #bindsetup on this system and attempted to set this system as a
client on the DNS. Since then management have decided otherwise, and all traces
of bind have been removed (I hope).
Anway, we solve the problem on the fly by adding a route to that specific
network segment ie. # route add -net xxx.xxx.xxx gateway xxx.xxx.xxx.xxx
Sometimes we also have to restart the network ie. #rcinet restart
for it to take effect.
The situation now is that the Windows NT DNS and Firewall
administrators point fingers at the 8200 and say that there is
probably still a switch which points to the DNS to do name resolving,
which I think is a load of B-S. They say this is what is causing the
redirection of the network traffic via the firewall instead of the
default router/gateway. I in turn point fingers at the NT DNS and I
asked them to remove all DNS records from their NT DNS and firewall
to do with the 8200. Whether this will solve the problem I don't know?
What are your views on this?
What factors could possibly be causing this redirection of traffic from the default
router to the Firewall?
Is it possible for DNS config. information to still be sitting on the
8200? Typically what files should I search for that could affect
this?
Here is another weird sample of netstat -r:(after removing the
firewall 200.1.1.14 from /etc/hosts)
Routing tables
Destination Gateway Flags Refs Use Interface
Netmasks:
Inet 255.0.0.0
Inet 255.255.255.0
Route Tree for Protocol Family 2:
default gateway UG 52 1147204 tu2
localhost localhost UH 1 0 lo0
196.6.175 200.1.1.14 UGM 5 45439 tu2
200.1.1 dec8200 U 128 4075124 tu2
Regards
Paulo
The solution which was exactly the solution to my problem.
==========================================
On my network there is, like on yours, a default gateway and
a firewall. In the gateway there is/was a default route, with a very
high cost/metric/preference, pointing to the firewall. This means
that if the gateway gets a request for a connection to a network that
is not within it's own routing tables, i.e. a network outside your
own LAN/WAN, go through the firewall (the network is 'out there').
Every once in a while there is a glitch on the line to a remote site,
the gateway updates it's routing tables or get's a request for this
off-line network and finds it unreachable, sends a redirect to the
server saying 'use the firewall' to get to this network.
I did a quick-and-dirty workaround to solve this problem, a shell
script that removes all routes not wanted. I let cron run this every
5 minutes. As I said, ugly but it works.
Later on we did a redesign of our router network and removed the
default route within the gateway. Problem solved permanently.
So, what it all boiled down to was routing (OSPF) timers and
default routes in the WAN.
BTW, I don't thing DNS has anything to do with it. Just to make
sure, check /etc/svc.conf and make sure the entry for hosts=
doesn't include bind and that /etc/resolv.conf is empty or that
the nameserver entry points to were you want it to. Best of all,
remove /etc/resolv.conf entirely.
===========================================
To add to what Frederik said:
We managed to solve the problem by configuring the remote router (at
the client side) to point only and direct traffic only via the router/gateway at
the 8200 (server) side.
Thanks once again.
Paulo
Received on Fri Feb 28 1997 - 17:25:22 NZDT