Strange Networker Hang could be DNS related or could there be another theory?

From: Kevin Criss <KCriss_at_dwd.state.in.us>
Date: Fri, 10 Aug 2001 11:33:50 -0500

Hi,

I always appreciate your help though many times I don't take the time
to thank you personally.

We have been fighting a strange Legato Networker hang up here that we
can't seem to beat. We run TRU64 UNIX 5.1 patch kit-003 and Legato
Power Edition 6.01 build 251.

The backup group is started manually by Operations at night. There are
three clients in this group. Each client is an Oracle database server
and these databases are brought up and down using the savepnpc
facilities of Legato to ensure the database are backed up in a cold
state. This cold state backup is the only way to go and mandatory
unless of course you are using the more expensive hot-on-the-fly backup
technology from Legato which requires Oracle's RMAN facilities. We are
licensed for this technology but for some reason my DBAs prefer the cold
backup currently. I'll try to minimize my side notes and stick to
defining the problem at hand.

Some times a backup client will hang, by hang I mean it won't start
streaming its files to the backup server. It just hangs there for no
reason waiting on someone or something someone but I can't seem to
figure this one out. I need a working theory.

The temporary fix is to stop the backup group using the nwadmin GUI,
boot the hung backup client, and then after the restart, the nwadmin GUI
is used to restart the backup group suspended by operations prior to
re-booting the hung client. This Operations booting manuever is not a
desirable practice and we would like to remedy it.

I have applied all the latest and greatest patches I think. I am also
trying to clean up the little things too but apparently I haven't hit on
right culprit or at the very least it is under my nose but I still can't
see it yet. I haven't yet queried the "Networker" mailing list with
these problem though I am subscribed. I may just cut and paste this
text verbatim to that list too even though I get good results here.
Eeeek! Another side note. :)

My backup server itself has been having DNS problems recently and
sometimes mail does not get processed because the name service is
cludged. We like to have Legato mail notifications on backup group
completions to the File Manager and also to I-O control so that they can
verify the quality and status of "Save Group Completions" at end-of-job
for each backup group. This is configured using the nwadmin GUI: from
the "Customize" pull-down, selecting the the "Notifications" menu item,
and then selecting the "Save Group Completions" sub item. This comes
factory configured by Legato for root mail however we have added
recently two additional mail addresses using the configuration line
below.

The configuration line now reads like this:
   /usr/bin/mailx -s "alpha2.dwd.state.in.us's savegroup completions"
root recipient1_at_domain-name recipient2_at_domain-name

Nothing too special there I guess. Unless I'm having size or space
issues with a spool file etc. etc. etc. but how would I diagnose that?

I have a very simple DNS setup and it has never given me any problems
until now. They only thing I do to correct the cludged DNS name service
is to run
#/sbin/init.d/named stop and then run #/sbin/init.d/named start. That
fixes the problem for me. You might ask how do I know my DNS name
service is cludged? The Legato savegroup completions that usually get
mailed out are returned with a HOST unknown error. The host allegedly
unknown is "pine.isd.state.in.us". This host relates to a mx record in
my /etc/namedb/host.db file.

   IN MX 20 pine.isd.state.in.us

When I come in the morning to check out the reasons for the returned
dead mail, I use the following command.

#nslookup pine.isd.state.in.us

The response from the command tells me the server can't be found.
However when I stop and restart /sbin/init.d/named service I can then
find it again. So I guess it is my problem. I don't even have a theory
as to what could cludging up my named services. I am behind a firewall
and my servers all use private IP addresses.

Should I hard code the IP address for pine in the MX record?

I'm not a DNS guru but; I would like to beat my DNS problem so this
could be ruled out as a possible cause for my Legato backup client
hangs. I realize this could be a symptom of the problem and not the
actual problem but I would still like to beat it.

I would also investigate any likely new theories that you would care to
post.

Thanks in advance.

Sincerely
Kevin Criss
Received on Fri Aug 10 2001 - 16:35:03 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT