SUMMARY: NFS3 RFS3_READ failed and NFS2 server luxor not respondi ng after upgrade to 5.1

From: Gronblom, Ernest <Ernest.Gronblom_at_Brooks.com>
Date: Thu, 03 May 2001 18:29:36 -0400

OK... I am WAY overdue here, but I didn't forget.

Thank you to those who responded. We had been having problems with
duplex settings between the network cards and the switches that they
are connected to due to autonegotiation, but I had resolved that. The
problem turned out to be a flaky network switch, but I will post here
the replies I got for others that may need them to peruse.

Thanks again to: Uwe Richter, Werner Rost, Derk Tegeler, Richard Jackson,
David Hull, and of course the answer man Dr. Thomas Blinn

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
some times ago we had the same problem/messages - but NFS2|3 errors
may result from many reasons. One thing you can look after is if
the status and lock daemons are responding well:

- do a "rpcinfo -u your-nfs-server status" from another machine,
  this should *immediately* return the string
  "program 100024 version 1 ready and waiting"
- If the "rpcinfo" gave the answer mentioned above forget the rest now -
  I hit the wrong side and couldn't help you ;-(

- If the above "rpcinfo" gives the string "cannot contact" after a while,
  on your nfs server:
  - kill the /usr/sbin/rpc.statd and /usr/sbin/rpc.lockd processes
  - delete all files in /etc/sm.bak
  - restart the two daemons by hand

- if this works, the statd fighted against windmills while trying to
  work in /etc/sm.bak/* file for machines no longer on line and/or
  locks remained after crashes
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
If ping works: Configure the NIC and the corresponding switch without
autonegiote because very often this leads to problems.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Every single time I've had NFS time-out's like yours it was because of
network errors or NFS server down time. In particular, I experienced these
time-out's because of an arp storm caused by an NT box.

I assume that the discrepancy between NFS2 and NFS3 in the kernel messages
are due to fall backs or hard coded versions in fstab or auto.master?
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Each version of Digital/Tru64 UNIX has a Software Product Description (SPD)
document. For Tru64 UNIX 5.1 it is on the OS CDROM in the /DOCUMENTATION
directory. In the SPD you will find the DE435 mentioned as being supported
for Remote Installation Service (RIS). That is not a solid yes but at least
it is support for RIS in Tru64 UNIX 5.1.

The file you want to review is
/DOCUMENTATION/TEXT/Tru64_UNIX_Operating_System_SPD.txt. It is a great
document and always handy to have around.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
I don't have a lot of information on this, but I do know that there is a bug
with respect to V3 -> V2 NFS mounts. We have this in our production SAP
environment. It has been escalated to engineering, but Compaq has yet to
provide a working fix for it. If you run both sides as V2, it should work
fine, I'm told.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
The DE435 is just another instance of a "tulip" card. The driver that
supports it is in the file "/sys/BINARY/tu.mod". Look at the date on
that file, and check the patch kit inventory to see if the file has been
replaced.

The problems you are seeing are probably due to a bad NFS patch. Looks
like some of the NFS operations are timing out. Even on new systems with
other Ethernet options, the "NFS2 server" stuff happens, and the NFS3 read
failure is also a network timeout problem.

Of course, that's a really old 10Mbit/second half-duplex adapter, and if
you've got it connected to some new switch or you've got it misconfigured
in the console firmware, that COULD be causing the problems; but I doubt
very much it's the Ethernet adapter or the driver for that adapter, the
problems are most like in NFS or other network layers.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Original question:
> Hello managers,
>
> I have an old 2100 system that has a DE435 network card that we upgraded
> to Tru64 5.1 (P.K.2) and now we get the following errors (the system is
> called luxor):
>
> Mar 16 04:25:21 luxor vmunix: NFS2 server luxor not responding still
trying
> Mar 16 04:25:29 luxor vmunix: NFS2 server luxor ok
>
> and when copying files over NFS from a remote server:
> Mar 16 18:19:06 luxor vmunix: NFS3 RFS3_READ failed for server omega: RPC:
> Timed
> out
>
> I have checked the archives and tried a bunch of things including making
> adjustments that sys_check recommends, but I am still getting the errors.
>
> My suspicion is that perhaps my hardware is old enough to be unsupported,
> but
> I can't find any documentation to confirm/discount this. Is the DE435
still
> supported
> with 5.1 (patch kit 2)?
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> = Ernest Gronblom Brooks Automation -
> - Unix Administrator 15 Elizabeth Drive =
> = Phone: (978) 262-5812 Chelmsford, MA -
> - Fax: (978) 262-2500 01824 =
> = E-Mail: Ernest.Gronblom_at_brooks.com -
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
= Ernest Gronblom Brooks Automation -
- Unix Administrator 15 Elizabeth Drive =
= Phone: (978) 262-5812 Chelmsford, MA -
- Fax: (978) 262-2500 01824 =
= E-Mail: Ernest.Gronblom_at_brooks.com -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Received on Thu May 03 2001 - 22:37:35 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT