File corruption (_maybe_ NFS related)

From: John G Dobnick <jgd_at_csd.uwm.edu>
Date: Tue, 27 Oct 1998 16:56:59 -0600 (CST)

 We've recently run across a strange file corruption problem, and were
 wondering if anyone else has seen anything similar.

 We have a number of Alphas with filesystems crossmounted via NFS. In
 particular, we have a 2100 with /etc mounted on a number of other
 Alphas.

 We run a locally built passwd program that updates the passwd files on
 multiple machines, with appropriate (and successful) locking. This
 program updates the password file on the machine it is called on,
 _and_ on the 2100 in question.

 Once in a while, on an intermittent basis, and without any apparent
 rhyme or reason, the password file "loses" a block of records from the
 middle. The missing block always starts at a 512 byte boundary, but
 does not always seem to be an integral number of blocks long. The
 only system showing this problem is the 2100, where updates are being
 made through NFS. The system where the passwd program is being run --
 another Alpha -- has _its_ passwd file (which is local to itself)
 updated correctly.

 [I hope that made sense.]

 The passwd program happens to be written in Perl, although that
 doesn't seem to be particularly relevant.

 Another user has, just today, reported a similar situation (large
 block of data missing from the middle of a file) on another system.
 This was just a plain text file -- no passwd file involved here. He
 had a (Perl) program that write records to two different file -- one
 local to his machine, the other NFS mounted. Again, it was the NFS
 copy that showed the corruption.

 This problem is intermittent. We can run for weeks without being
 "hit". On occasion, both of the machines that "hosted" the corrupted
 NFS file can become extremely busy, although we cannot confirm that
 such busy peaks occurred coincident to the corruption.

 We are less willing to suspect Perl in this case, now that a second
 instance has shown up; and a third, that does not involve Perl at
 all. We are currently leaning toward either a deep-seated IO bug
 (which we tend to consider unlikely), or an NFS problem of some sort.

 I apologize for the speculative nature of this message, but we're
 kind of grasping here. Any and all information, suggestions,
 hints, etc. will be welcome.

--
John G Dobnick                          "Knowing how things work is the basis
Information & Media Technologies         for appreciation, and is thus a
University of Wisconsin - Milwaukee      source of civilized delight."
jgd_at_csd.uwm.edu   ATTnet: (414) 229-5727                    -- William Safire
Received on Tue Oct 27 1998 - 22:57:58 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT