NFS file-locking problem

From: T. S. Horsnell <tsh_at_mrc-lmb.cam.ac.uk>
Date: Sun, 27 Sep 1998 12:36:05 +0100 (BST)

Hi all,
I'm having problems with NFS file locking.
I have a directory on an AS2100 which contains a queue of jobs
serviced by as bunch of Alpha PC164's. The jobs are submitted
by users on the AS2100 and picked off the queue by idle 164's.
The queue lives on the AS2100 and is NFS-mounted by the 164's.
To synchronise queue manipulations, there is a lock file
in the queue directory and each machine takes out a (NFS) lock on this file
(lockf with F_LOCK) before manipulating the queue, and releases the lock after.
This mostly works fine. Now and again, however (like once a week or so)
the whole lot seizes up. If I wade through the 164's 1 by 1, killing
the program which takes out the lock, things are eventually freed up.
It's never the same one causing the problem. I also notice that
rpc.lockd coredumps on these machines from time-to-time, but
not coincidental with the seizup. (It somehow restarts itself...)
Anyone know of any current pproblems with NFS file-locking?
Race conditions etc.

I should add that the first version of the locking prog used to just
take out the lock regardless. Things were *much* worse then. Now, it takes
a look at the lock status and if the file is already locked, it sleeps
for a random (few) seconds and retries up to a max of 5 times, whereupon
it takes out the lock regardless.

All machines are running DUNIX 4.0D patch kit 2. Network is copper FDDI.

Any words of wisdom/consolation much appreciated, which I'll gladly summarise.

Cheers,
Terry.

Terry Horsnell
Laboratory of Molecular Biology
Hills Road
Cambridge UK.

tsh_at_mrc-lmb.cam.ac.uk
Received on Sun Sep 27 1998 - 11:36:59 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT