Thanks for all the responses to my query about NFS troubles. I had several
suggestions to check /etc/mountdtab (and /var/lib/nfs/rmtab on the Linux
machines) to check that all exports had been cleared after rebooting, and
also to check for any entries persisting in /ety/sm and /etc/sm.bak.
In the end it was replies that suggested checking out speed/duplex
mis-match problems between machines and switches that pointed me in the
right direction. The majority of the NFS truble seemed to start soon after
I had to relocate two XP1000's due to some building work taking place.
The machines were moved and plugged in where we previously had a couple
of old DEC3000 boxes. I'd swapped over the cables at the switch so
the speed/duplex was correct but the cables that had worked fine at
ethernet speeds for the DEC3000's obviously couldn't handle fast ethernet
even though they're stamped 'CAT5'. As soon as I stuck in a new cable
to each machine (same sockets in the switch) everything came back to
life and there's been no sign of 'NFS server not responding' messages.
thanks again to all who replied,
john
---------- Original message ----------
Date: Thu, 6 Sep 2001 16:40:45 +0100 (BST)
From: John Deacon <jrd_at_star.ucl.ac.uk>
To: tru64-unix-managers_at_ornl.gov
Subject: NFS woes
Hi all -
I've recently started having real NFS problems. I've got a LAN of
18 Alphas from old DEC3000/400 boxes up to XP1000's all running
Tru64 v5.1 (upgraded to this about a month ago) and about 20
Red Hat Linux boxes (v6.2 and 7.1). The Linux boxes serve NFS discs with
no problems but on the Alphas I keep getting messages like
machine1 vmunix: NFS3 server machine2 not responding
or
machine1 vmunix: NFS2 server machine2 not responding
and machine1 will then hang - sometimes I get an 'NFS server OK'
message but then that's usually followed by another 'not
responding' one. The thing is that all other machines CAN see machine2
OK and don't give any errors.
Weirder than that is i'm sometimes also getting
machine1 vmunix: NFS2 server machine1 not responding
so the machine can't see itself (!) and all users on the machine are stuck
but everything else can see the discs it's serving. When this happens I
can't log on to do anything, the only way out is to crash the machine and
reboot it - something that takes about 20 minutes on our main server
because it has to fsck all the discs and then check all the disk quotas...
And finally, one particular machine was working fine until I had to take
it down on Monday evening, it seemed to reboot OK but now insists 'NFS3
server blah not responding' and users can't login. It's been rebooted
several times and still comes back with the same error for the same
'non-responding' server. The server in this case is our NIS master and
since users can login to all other machines I'm tempted to think that
it IS responding... I've even gone as far as to shut down all machines and
reboot them starting with the NIS master, then the slaves, then the
clients but, straight away, the same machine says that the NIS master
isn't responding.
Any NFS gurus out there ?
john
----------------------------------------------------------------------------
John Deacon | UCL Starlink Site Manager
Dept Physics and Astronomy | Email: jrd_at_star.ucl.ac.uk
University College London | Tel: 020 7679 7147
Gower Street London WC1E 6BT | Fax: 020 7679 7145
----------------------------------------------------------------------------
Received on Mon Sep 17 2001 - 12:47:08 NZST