NFS problems on TruCluster 5.1

From: Karl Jakobsson <kajak_at_orange.dk>
Date: Fri, 05 Apr 2002 16:27:15 +0200

Ok, have some very strange problems with the "Highly available" NFS service
on TruCluster 5.1.
Here is a short overview of the system.

5.1 Cluster called "Hermes" on this I have a mount point called /nfs

nfs-dom#fset00 on /nfs type advfs (rw)

This is a 400Gig LSM volume.. looks something like this:
v nfs-vol fsgen ENABLED ACTIVE 853300562 ROUND -
pl nfs-san21 nfs-vol ENABLED ACTIVE 853300562 CONCAT -
RW
sd nfs-s21s1 nfs-san21 nfs_s21s1 0 426650281 0 dsk18
ENA
sd nfs-s21s2 nfs-san21 nfs_s21s2 0 426650281 426650281 dsk33
ENA
pl nfs-san29 nfs-vol ENABLED ACTIVE 853300562 CONCAT -
RW
sd nfs-s29s1 nfs-san29 nfs_s29s1 0 426650281 0 dsk25
ENA
sd nfs-s29s2 nfs-san29 nfs_s29s2 0 426650281 426650281 dsk26
ENA


All good so far.. I'm exporting this with:
/nfs -root=0 multix-giga.intra.orange.dk mystix-giga.intra.orange.dk

I'm running nfsd on both members in the cluster, and rpc.lockd -c on one
member only (of course).

On a different system (multix) I mount this with..
hermes:/nfs/app /app nfs rw,soft 0 0
hermes:/nfs/mobilix /mobilix nfs rw,soft 0 0

Now we have 2 different mount points, pointing at 2 different dirs in the
same filesystem.
Normally, after a clean reboot everything works, I can read/write both dirs,
create/delete files.
/app has a subdirectory called "mbx51", now sometimes I try to access
/app/mbx51 I get

NFS3 RFS3_READDIRPLUS failed for server hermes : RPC: Timed out

But at the same time I can still read/write/delete/create files in /app ..
this I must say more then puzzles me.
A cluster reboot of both nodes clear this problem, but it quickly returns.

Now I have tried several things to solve this. first I changed the exports
to separate the two dirs /app and /mobilix
/nfs/app -root=0 multix-giga.intra.orange.dk mystix-giga.intra.orange.dk
/nfs/mobilix -root=0 multix-giga.intra.orange.dk mystix-giga.intra.orange.dk

The symptoms were similar, but now I could not access /app at all, /mobilix
worked perfectly.

Next thing to do was to create a new fset on the nfs domain, to split the
dirs up to 2 different filesystems on the nfs-server
nfs-dom#fset00 on /nfs type advfs (rw)
nfs-dom#fset01 on /hej type advfs (rw)
and remount..
hermes:/nfs/app /app nfs rw,soft 0 0
hermes:/hej/mobilix /mobilix nfs rw,soft 0 0

still I get no traffic to /app but /mobilix works file..

it doesn't matter witch member I relocate cluster_lockd too (rpc.lockd)
What does help is if I only run nfsd on ONE member in the cluster.
Then everything seems ok, but of course I would like to run the nfsd on both
and have some sort of balancing/failover.

Hope someone can help me on this..

Med venlig hilsen / Kind regards
Karl Jakobsson
Digital UNIX Administrator
TID/Backoffice/UNIX-Team
Direct phone: (+45) 82 33 62 54
Mobile phone: (+45) 268 00 254
E-mail: kajak_at_Orange.dk <mailto:kajak_at_Orange.dk>

Prags Boulevard 80
DK - 2300 København S
Received on Fri Apr 05 2002 - 14:27:57 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:43 NZDT