Summary: Strange problem serving NFS from Greg Freemyer on 2001-04-28 (tru64-unix-managers)

From: Greg Freemyer <freemyer-ml_at_NorcrossGroup.com>
Date: Fri, 27 Apr 2001 12:56:18 -0400

Thanks for the help,

Especially Seldon Ball.

On the NFS client I switched to tcp from udp.

I also added retrans=10 to the fstab entry on the client.

This fixed the problem. I don't know which of the above had the bigger impact, but between the 2, the SCSI bus resets no longer cause any problems on the NFS client.

Original Problem below:

I have 2 Tru64 DS10s (V5.0a) and a Linux machine on the same subnet. They are the only machines on the subnet.

I am getting some strange NFS behavior on the subnet when I reboot one of the Tru64 machines.

Configuration:
    Tru64 V5.0a Server A is configured to have an NFS export.

    Tru64 V5.0a Server B does not have an /etc/exports file yet, nor is it a NFS client, therefore it should have ZERO impact on NFS.

    The Linux machine is a NFS client of Server A.

    Server A and B are on a common SCSI bus, thus when either server is rebooted, the other experiences some SCSI resets. (I am using LSM to mount a shared disk shelf to whichever server is primary. Basically a poor man's cluster.)

Problem:
    I rebooted Server B a few days ago and got NFS errors in the Linux machines log file at the same time. (No NFS logs at all on either of the Tru64 machines.)

    Server B is not a production server, so I have rebooted it several times since then to see if the Linux machine always gets errors.

    It does not, but I do get error messages in the Linux about half the time.

Logs:
    The errors are like the below, and are causing my application some problems. (helo is the name of my linux machine).

    Original reboot:

Apr 16 17:13:00 helo kernel: nfs_statfs: statfs error = 116
Apr 16 17:13:00 helo kernel: nfs_statfs: statfs error = 116

    and on a different reboot

Apr 19 12:04:41 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:05:02 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:05:09 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:05:09 helo kernel: nfs_statfs: statfs error = 5
Apr 19 12:05:23 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:05:39 helo last message repeated 2 times
Apr 19 12:05:39 helo kernel: nfs_statfs: statfs error = 5
Apr 19 12:05:44 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:06:11 helo last message repeated 3 times
Apr 19 12:06:11 helo kernel: nfs_statfs: statfs error = 5
Apr 19 12:06:15 helo kernel: nfs: server netvan not responding, timed out
Apr 19 12:06:26 helo kernel: nfs: server netvan not responding, timed out

Any ideas as to why rebooting Server B would affect the NFS communication for Server A and its client would be much appreciated.

My one guess is that the shared SCSI bus is somehow causing Server A to briefly go haywire from NFS's perspective.

If so, are there any configuration changes I should make?

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com
Received on Fri Apr 27 2001 - 16:58:37 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT