Thanks for the help,
Especially Seldon Ball.
On the NFS client I switched to tcp from udp.
I also added retrans=10 to the fstab entry on the client.
This fixed the problem. I don't know which of the above had the bigger impact, but between the 2, the SCSI bus resets no longer cause any problems on the NFS client.
Original Problem below:
I have 2 Tru64 DS10s (V5.0a) and a Linux machine on the same subnet. They are the only machines on the subnet.
I am getting some strange NFS behavior on the subnet when I reboot one of the Tru64 machines.
Configuration:
Tru64 V5.0a Server A is configured to have an NFS export.
Tru64 V5.0a Server B does not have an /etc/exports file yet, nor is it a NFS client, therefore it should have ZERO impact on NFS.
The Linux machine is a NFS client of Server A.
Server A and B are on a common SCSI bus, thus when either server is rebooted, the other experiences some SCSI resets. (I am using LSM to mount a shared disk shelf to whichever server is primary. Basically a poor man's cluster.)
Problem:
I rebooted Server B a few days ago and got NFS errors in the Linux machines log file at the same time. (No NFS logs at all on either of the Tru64 machines.)
Server B is not a production server, so I have rebooted it several times since then to see if the Linux machine always gets errors.
It does not, but I do get error messages in the Linux about half the time.
Logs:
The errors are like the below, and are causing my application some problems. (helo is the name of my linux machine).
Original reboot:
Apr 16 17:13:00 helo kernel: nfs_statfs: statfs error = 116
Apr 16 17:13:00 helo kernel: nfs_statfs: statfs error = 116
and on a different reboot
Apr 19 12:04:41 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:05:02 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:05:09 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:05:09 helo kernel: nfs_statfs: statfs error = 5
Apr 19 12:05:23 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:05:39 helo last message repeated 2 times
Apr 19 12:05:39 helo kernel: nfs_statfs: statfs error = 5
Apr 19 12:05:44 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:06:11 helo last message repeated 3 times
Apr 19 12:06:11 helo kernel: nfs_statfs: statfs error = 5
Apr 19 12:06:15 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:06:26 helo kernel: nfs: server netvan not responding, timed out
Any ideas as to why rebooting Server B would affect the NFS communication for Server A and its client would be much appreciated.
My one guess is that the shared SCSI bus is somehow causing Server A to briefly go haywire from NFS's perspective.
If so, are there any configuration changes I should make?
Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com
Received on Fri Apr 27 2001 - 16:58:37 NZST