Thanks again everyone! Got a few more responses from
John Venier
Jim Belonis
Steffi Laurentius
Joe Fletcher
Jim Lola
I also called Compaq, who gave the final response:
Apparently, with automount, only kill -15 should be used. Automount
needs to shutdown gracefully, if it is to be reused. Any other form
of kill will render automount useless, and tie up the resources until
reboot.
Cheers
Paul
======================================================================
dont have any further ideas for your problem exactly but for your info we
have got lots of problems around any NFS and automounter stuff we are
running between V5.1 and V4 systems, which compaq have actually managed to
reproduce and engineering will hopefully be coming up with some sort of
patch fix soon. Their first response to us was " can you not upgrade
everything to V5 ?!!" I will let you know if you want me to when we get the
patch and how successful we are with it.
======================================================================
did you have a look at arp -a ? The error message server "`this machine
name` not responding" could mean that `this machine name` is still in your
arp table - but with the old MAC-address.
======================================================================
On Solaris computers, when amd automounter acts up, I sometimes rename away
the automount-point where the remote disk is actually mounted
/auto/bluemoon
for instance (where bluemoon is the remote computer name)
to /bluemoon_delete_me (to be deleted after the next reboot)
and also remove bluemoon from the /etc/mnttab file with a text editor.
This solves df hangs and allows amd to mount the disk again properly.
I don't remember doing that kind of thing on Tru64 Unix,
and I don't use the Tru64 automounter (we still use amd).
And the symptom doesn't sound amenable to this. It sounds more like a
communications problem between the two computers, or a permissions problem.
======================================================================
Sound like your are already in trouble. When you say you bounced the NICs
-
how and why? Have you tried rcinet restart? It's a bit drastic but it
might help.
If something has a lock on the nfs directory use fuser to find the
offending process and kill it. I'd start with this rather than the rcinet
restart.
======================================================================
Check the server that is serving out the file systems and verify that the
file
systems are available and in the /etc/exports.
On the client system, we use /etc/auto.direct and /etc/auto.master for
automount
or autofs (in 5.1). If all this checks out, the only thing we've found to
clear
this condition is to reboot the client system.
Its for this very reason that when 5.1 came around, we abandoned automount
and
went with autofs. autofs is like a parallelized amd and we have not
encountered
any problems with it. When we used automount, we had weekly issues with
this
program.
======================================================================
Second Post
had a couple of quick replies, thanks to
Joe Fletcher
Jim Lola
There replies are below.
After stopping and starting the network a few times, I have managed to
free
the one important mount point - so the pressure is off.
I still have another directory locked though, and automount now will not
start. Both get the error NFS2: server `this machine name` not responding
still trying. Interestingly, fuser on the directory name also gets the
error.
I have looked through the output of
lsof
ps -ef
for any suspicious looking programs, but - again - to no avail.
Versions of the software
Machine with locked directory - 5.0
Machine contain export dir - 4.0g
Any further ideas? At the moment, it's looking like a scheduled reboot on
the weekend.
Cheers
Paul
p.s. The original problem came from enabling a second network card for
backups, then bouncing the network.
Original Post
==============================================================
hello
Need some quick help....
I bounced my network cards this morning to define the second card, and the
automounts between machines have died. I have rebooted one machine (the
only
one possible) and that fixed that machine, but my production machine is
still in a bad way - I would prefer not to bounce it...
I cannot do an 'ls' on the directory that was automounted, it comes back
with
NFS2 server `this machine name` not responding still trying
I have run
/sbin/rc3.d/S19nfs stop
/sbin/rc3.d/S19nfs start
/sibn/rc3.d/S20nfsmount stop
/sibn/rc3.d/S20nfsmount start
but to no avail...
I have searched the archives, this question has been asked before, but
there
is no summary that works...
One of the automounted directories is now in a bad state. I cannot do an
ls
on the directory name, and cannot unmount the directory - so it seems that
something is holding it open.
Received on Fri Feb 09 2001 - 20:11:18 NZDT