I've had named running on an Alpha 500 for 2 years without
problem. For the past 2 weeks, it has got slower and slower
and today it is unbearable.
A typical nslookup session shows that our DNS server
(duke.neuronet.com.my - 202.184.153.3) is timing out whereas
our upstream provider's DNS server (relay1.jaring.my) can
resolve the very same domain names :
# nslookup
Default Server: duke.neuronet.com.my <--- my DNS server
Address: 202.184.153.3
> infoseek.com
Server: duke.neuronet.com.my
Address: 202.184.153.3
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 4 seconds.
DNS request timed out.
timeout was 8 seconds.
*** Request to duke.neuronet.com.my timed-out
> server relay1.jaring.my <---- change to upstream DNS
Default Server: relay1.jaring.my
Address: 192.228.128.11
> infoseek.com
Server: relay1.jaring.my
Address: 192.228.128.11
Non-authoritative answer:
Name: infoseek.com
Address: 204.162.96.2
The same happens with most US sites. Resolving local
(xxx.com.my) domains, I sometimes time out once with
my own DNS server too.
----------------------------------------------
Notes :
1. Our DNS only serves primary for about 40 domains
and secondary for 1 domain.
2. In the past 2 weeks, we have added .com, .org and
.ec domains for the first time. Prior to that, all
domains were local to Malaysia (.com.my)
3. 'top' has shown that named has shot up from about
6 MB to 12 MB over the past 2 weeks. Only 5 new domains
were added during that period.
4. The international line from Malaysia to the US has
apparently been deteriorating the past 3 weeks, but
this would not explain why my upstream provider's DNS
can resolve domain names that I time out on. They're
just one hop away.
5. I only have 50 users whose workstations use our name
server.
6. External traffic to our websites has not substantially
increased in the past 2 weeks.
7. The DEC 500 has 96 MB ram and only runs 2 webservers,
DNS and mail. Top claims that 30 MB are free. It has
been running DU3.2c and named for 2 years without problem.
8. Some US sites (eg. oracle.com) resolve the domain name
at a resonable speed ... so I my initial thought was that
it was just a bottleneck on a segment on the internet.
However, that wouldn't explain why my DNS times out when
my upstream can resolve, say, dejanews.com
I can not see why the addition of non-local domains
would cause this so I presume points 2 & 3 are a coincidence.
I do hope that our name server is answering queries for the
domains that we host (eg. tanjungrhu.com.my, mdc.com.my).
I intend to :
- set up a caching DNS server to query our upstream
provider and tell my internal users to use it.
- move the webservers to a different machine. DNS and mail
only on the DEC500.
But don't think that this is going to cure things.
Anyone have any ideas ?
Any way to monitor just how many queries named is answering and
tracking the bottleneck ?
A typical traceroute to infoseek.com is :
# traceroute infoseek.com
Tracing route to infoseek.com [204.162.96.2]
over a maximum of 30 hops:
1 * * * Request timed out.
2 111 ms 110 ms 110 ms 161.142.32.25
3 100 ms 100 ms 110 ms e0.ttk7.jaring.my [161.142.219.8]
4 130 ms 110 ms 110 ms e0.ttk15.jaring.my [161.142.219.16]
5 170 ms 170 ms 60 ms h0-0.bkj15.jaring.my [161.142.0.81]
6 110 ms 110 ms 121 ms fe0-0.bkj16.jaring.my [161.142.78.16]
7 351 ms 290 ms 561 ms 205.174.74.205
8 381 ms 390 ms 381 ms Hssi4-1-0.GW2.SFO1.ALTER.NET [157.130.193.45]
9 * 400 ms * 113.ATM10-0-0.XR2.SCL1.ALTER.NET
[146.188.145.62
]
10 * 391 ms * 194.ATM2-0-0.GW2.PAO1.ALTER.NET
[146.188.144.65]
11 * * * Request timed out.
12 * * * Request timed out.
13 391 ms 400 ms 381 ms corp-bbn.infoseek.com [204.162.96.2]
Trace complete.
Yes, the first hop to my router does look dodgy...
don't know why it does that either but it seems to
channel traffic fine so I've never worried about it.
Sorry for the long post... but can't pinpoint the problem and the
more evidence the better. Thank you for any ideas,
chas
Received on Thu Apr 30 1998 - 08:13:47 NZST