Summary: Strange Networker Hang could be DNS related or could there be another theory?

From: Kevin Criss <KCriss_at_dwd.state.in.us>
Date: Thu, 23 Aug 2001 11:46:29 -0500

[THANK YOU]

This was a hard one for me; I only received one response from the
mailing list but his suggestions were helpful. I also opened a
communications channel to the State of Indiana Department of Information
Technology's, (D.O.I.T) we say Do IT, Systems Administrator Brian Zust,
plus we logged a Compaq Support sequence number too.

William H. Magill
Brian Zust of Do IT
#C010813-934


[OVERVIEW]

This was two separate problems:

I was having a strange Networker stall or hang which we eventually
resolved, see our explanation in the posting Update: Legato 6.02 Build
251 Savepnpc and Oracle Databases
http://www.ornl.gov/its/archives/mailing-lists/tru64-unix-managers/2001/08/msg00258.html

Plus we were having DNS issues when trying to mail out the SAVE GROUP
COMPLETIONS at the end of our Networker backups. The complete posting
for this problem can be read at
http://www.ornl.gov/its/archives/mailing-lists/tru64-unix-managers/2001/08/msg00169.html


The condensed problem description reads as follows:

I have a very simple DNS setup and it has never given me any problems
until now. They only thing I do to correct the cludged DNS name
service
is to run #/sbin/init.d/named stop and then run #/sbin/init.d/named
start. That fixes the problem for me. You might ask how do I know my
DNS name
service is cludged? The Legato savegroup completions that usually get
mailed out are returned with a HOST unknown error. The host allegedly
unknown is "pine.isd.state.in.us". This host relates to a mx record in
my /etc/namedb/host.db file.

IN MX 20 pine.isd.state.in.us

When I come in the morning to check out the reasons for the returned
dead mail, I use the following command.

#nslookup pine.isd.state.in.us

The response from the command tells me the server can't be found.
However when I stop and restart /sbin/init.d/named service I can then
find it again. So I guess it is my problem. I don't even have a
theory
as to what could cludging up my named services. I am behind a
firewall
and my servers all use private IP addresses.

Should I hard code the IP address for pine in the MX record?


[SUMMARY]

Here is what we think is happened. I only have one mail exchanger
configured. I guess two is preferable so I am in the processing of
selecting a second.
Its already selected but I have to ask permission before I use it and
this requires a round-tuit which I'm working on.

My previous DNS configuration required the use of DNS to locate the
address for pine.isd.state.in.us but [here's the theory] due to local
traffic conditions I could not get a response back from my DNS system
who was asking Do Its DNS system for the address of
pine.isd.state.in.us. The response would not come back due to other
network backup operations taking place on the Campus Backbone. I was
being zone out. Then my cache and TTL settings (time to live) would
keep a persistent memory of pine.isd.state.in.us not being found. Then
when I would come in the morning and do a #nslookup on
pine.isd.state.in.us it would continue to remember that this server was
not found. Really is wasn't not found, the question was never answered.
 Stopping and starting the named services would empty the cache or so I
suspect. After stopping and starting the services subsequent #nslookups
on pine.isd.state.in.us were resolved.

My Data Center backups confines its backup traffic to the local smart
switch, therefore we don't tax the Campus wide Backbone with our backup
traffic. However our in-house local area network/Novel team and DoIT
both conduct network backups over the entire extent of the Campus wide
Backbone. We suspect, Brian Zust and I, this other backup traffic was
causing our problems. Kinda like the other network backups were causing
a denial of service to my Data Center. Eeeek! Just a theory hope no
one gets upset with this theory.

We plan to procure a Cabletron chassis switch with gigabit technology
for our Data Center. This chassis will have a gigabit connection to the
chassis is DoIts data center and we think the fatter pipe will reduce or
elimate this problem. Our present connection to DoIT is through the
in-house LAN/Novell team and it is running at 10baseT. I'm going to fix
that problem by a factor of 100.

Our theory on the network congestion was sorta proved out when I
followed William H Magill's suggestion. He kept saying I needed an
address record for Do Its pine.isd.state.in.us box in my DNS system.
Instead of creating this address record for pine.isd.state.in.us in my
DNS system I defined the address for pine.isd.state.in.us in my
/etc/host file. Then when my Legato Networker wanted to relay the Save
Group Completions mail to my Novel GroupWise email account the IP
address for pine.isd.state.in.us was in the /etc/host file. It could
then send the mail to pine.isd.state.in.us but since the network was
temporarily overloaded this mail would queue up locally and then created
the following informational message:

I'll just paraphrase the message, I have to go to lunch but it would
say something like this. We can't deliver the message to
pine.isd.state.in.us because the host is unreachable, we will keep
trying, in fact we will keep trying for five days before we eventually
give up.

Well that's a whole lot better than the "host not found"
dead-letter-mail problem I was trying to beat. The mail does get
through eventually and we are pretty sure, unless some one says
otherwise, the problem was induced by the network backup traffic of
other administration groups who were causing me a
mini-denial-of-service. I use that term loosely. Certainly it was
unintentional and in the pursuit of their own job mission but I think I
spent two weeks on the problem.

We just need to upgrade our infrastructure so it can handle the
technology we choose to deploy.

As usual, if you have a better theory I'd like to hear it.

Sincerely
Kevin Criss

  
Received on Thu Aug 23 2001 - 16:47:39 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT