[Fwd: broken flock on Tru65 5.X (fwd)]

From: Yogesh Bhanu <yogesh_at_gsf.de>
Date: Wed, 24 Apr 2002 16:09:56 +0200

Hi all,
        Thats the mail Matt Goebel <goebel_at_emunix.emich.edu> was talking about
..
Thanx in advance
yogesh



Martin MOKREJŠ wrote:
>
> Yogesh,
> can please forward this to tru64-unix-managers_at_ornl.gov ? Thanks.
>
> --
> Martin Mokrejs <mmokrejs_at_natur.cuni.cz>
> PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
> MIPS / Institute for Bioinformatics <http://mips.gsf.de>
> GSF - National Research Center for Environment and Health
> Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
> tel.: +49-89-3187 3616 , fax: +49-89-3187 3585
>
> ---------- Forwarded message ----------
> From: Martin MOKREJŠ <mmokrejs_at_natur.cuni.cz>
> To: readers_comment_at_zk3.dec.com
> Cc: tru64-unix-managers_at_ornl.gov
> Date: Wed, 24 Apr 2002 11:58:46 +0200 (CEST)
> Subject: broken flock on Tru65 5.X
>
> Hi,
> it seems there's changed flock() behaviour on Tru64 since 5.X. As it
> turned out when inspecting problems with sendmail, the manpages are
> possibly outdated, stating false description and the current implemetation
> does not work anymore in the good old way as it did in 4.0 systems.
>
> I tried to find some information at compag official website, search
> through offcial docs, looked into Technical updates for OS releases, but
> nowhere I've found a list of technical details. Can you please tell me
> where to find such information?
>
> Can you please forward this mail to the developers so that I could a
> reply from them? I'm CC'ing this mail to the tru64-unix-managers e-mail
> group as I hope someone will hopefully forward this to the right person
> (thanks!). The links to "Contact us" and "Suport - Contact" are pretty
> useless on your web, sorry!
>
> TIA
> --
> Martin Mokrejs <mmokrejs_at_natur.cuni.cz>
> PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
> MIPS / Institute for Bioinformatics <http://mips.gsf.de>
> GSF - National Research Center for Environment and Health
> Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
> tel.: +49-89-3187 3616 , fax: +49-89-3187 3585
>
> ------------------------------------------------------------------------
>
> Subject: Re: trouble with HOST
> Date: Fri, 19 Apr 2002 14:24:52 -0500
> From: Neil W Rickert <sendmail+rickert_at_sendmail.org>
> Reply-To: sendmail-questions_at_sendmail.org
> To: David Komanek <David.Komanek_at_natur.cuni.cz>
> CC: sendmail-questions_at_sendmail.org,Martin MOKREJŠ <mmokrejs_at_natur.cuni.cz>,sendmail+rickert_at_sendmail.org
> References: <3CBE6478.8030704_at_natur.cuni.cz> <Pine.OSF.4.21.0204181020410.32059-100000_at_tao.natur.cuni.cz> <13999.1019160590_at_euclid.cs.niu.edu> <3CBFB4A3.6070705_at_natur.cuni.cz>
>
> David Komanek <David.Komanek_at_natur.cuni.cz> wrote:
>
> >>I would like to also see the output from
>
> >> grep g3GBg1wG138777 debug.log
>
> >>If you have rotated logs since then, substitute the appropriate
> >>name.
>
> >I have logs 14 days ago online, older are on the backups, so there is no
> >problem to get the information. Here is it. Quite interresting, but I
> >don't think I understand it well :-)
>
> Here is my assessment. I could be mistaken. Hopefully Greg and
> Claus (at sendmail.org) will also be reviewing this.
>
> Your sendmail was compiled with HASFLOCK . It looks as if that
> is a mistake.
>
> I am mainly using solaris, with a little linux on the side. There is
> an flock() library function for solaris. But it is emulated using
> FCNTL/LOCKF, which gives an incomplete emulation. Sendmail is compiled
> without HASFLOCK on solaris.
>
> Your system is behaving as I would expect solaris to behave if I made
> the mistake of compiling with HASFLOCK.
>
> I'm not sure what is the simplest way to change your system to not
> use FLOCK. Perhaps Greg and/or Claus will suggest something.
>
> Here is an excerpt from your logs, with comments.
>
> >Apr 16 13:42:03 prfdec sendmail[138777]: NOQUEUE: --- 250 2.0.0
> >g3GBg1wG138777 Message accepted for
> >delivery
>
> That's the log showing that the message had been received.
>
> At this stage, sendmail would normally fork(), and the child would
> attempt to deliver.
>
> >Apr 16 13:42:03 prfdec sendmail[138935]: g3GBg1wG138777: --- 050
> ><tomas_at_bodye.cz>... Connecting to m
> >ail.bodye.cz. via esmtp...
>
> Here we have a process picking up the message for sending.
> >Apr 16 13:42:03 prfdec sendmail[138935]: g3GBg1wG138777: SMTP outgoing
> >connect on tao-eth.natur.cuni
> >.cz
>
> That's the same process.
>
> >Apr 16 13:42:16 prfdec sendmail[138883]: g3GBg1wG138777: SMTP outgoing
> >connect on tao-eth.natur.cuni
> >.cz
>
> Here we have a second process picking up the message for
> sending. Now both 138935 and 138883 are working on the same
> message. That is not supposed to happen.
>
> I'm guessing, but I think 138935 is the forked child of the receiving
> process, while 138883 is a queue runner.
>
> flock() is supposed to hold the lock through fork(). So 138935, as
> the child, assumes that it has the queue file locked. However if
> flock is emulated (badly) with fcntl, then the lock is lost during
> the fork(). Thus 138935 works on the assumption that it has this
> queue file locked. But the lock was actually lost, and that allowed
> 138883 to pick up this message.
>
> >Apr 16 13:44:15 prfdec sendmail[140157]: g3GBg1wG138777: locked
>
> There are several of these. They show that locking does work,
> at least in some form, on your system. It is consistent with
> my diagnosis above.
>
> >Apr 16 13:51:17 prfdec sendmail[138935]: g3GBg1wG138777: --- 050
> ><tomas_at_bodye.cz>... Sent (Data rece
> >ived OK.)
>
> Now the message has been sent by 138935, which deletes the queue
> file (both qfg3GBg1wG138777 and dfg3GBg1wG138777).
>
> >Apr 16 13:51:17 prfdec sendmail[138935]: g3GBg1wG138777:
> >to=<tomas_at_bodye.cz>, ctladdr=<uamvt_at_natur.c
> >uni.cz> (415/15), delay=00:09:16, xdelay=00:09:14, mailer=esmtp,
> >pri=30415, relay=mail.bodye.cz. [21
> >2.71.156.38], dsn=2.0.0, stat=Sent (Data received OK.)
> >Apr 16 13:51:17 prfdec sendmail[138935]: g3GBg1wG138777: done;
> >delay=00:09:16, ntries=1
>
> And it logs the successful sending.
>
> >Apr 16 13:52:48 prfdec sendmail[138883]: g3GBg1wG138777:
> >to=<tomas_at_bodye.cz>, ctladdr=<uamvt_at_natur.c
> >uni.cz> (415/15), delay=00:10:47, xdelay=00:10:33, mailer=esmtp,
> >pri=120415, relay=mail.bodye.cz. [2
> >12.71.156.38], dsn=4.0.0, stat=I/O error
>
> I'm not sure what the I/O error was there. It may be unrelated. In
> any case, this results in a temp failure. Process 138883 rewrites
> the qfg3GBg1wG138777 file from its buffered information. From there,
> later processes can again pick it up but will run into the missing
> dfg3GBg1wG138777 file.
>
> ------------
>
> Here is a temporary work around.
>
> Configure DeliveryMode=queue . You seem to be running frequent
> queue runs, so this will not delay the message for too long.
> With that change, there is no longer any dependence on a lock
> being inherited by a child after fork().
>
> Let us know whether that helps.
>
> Also, send us your man pages for flock() -- maybe that will tell us
> whether my diagnosis is correct.
>
> -NWR
>
> ------------------------------------------------------------------------
>
> Subject: Re: trouble with HOST
> Date: Mon, 22 Apr 2002 14:14:38 -0700
> From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
> To: David Komanek <David.Komanek_at_natur.cuni.cz>
> CC: Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>, sendmail-questions_at_sendmail.org,
> sendmail+rickert_at_sendmail.org
> References: <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz> <3CC3FB3D.7070906_at_natur.cuni.cz>
>
> David.Komanek> I changed the DelivweryMode to advised "queue" instead
> David.Komanek> of "background" in /etc/mail/sendmail.cf just now. I
> David.Komanek> thought, SuperSafe=True tells sendmail to enqueue every
> David.Komanek> message. If not, I am not sure what SuperSafe does, but
> David.Komanek> it is another thread, probably not related to sendmail
> David.Komanek> locking strategies.
>
> It does, and sendmail is queuing it up (otherwise, the second queue
> running process wouldn't be finding it). The problem is the lock is
> dropped while the first sendmail is still operating on it.
>
> David.Komanek> Well, this problem concerns only situations in which the
> David.Komanek> child unlocks the file while the parent process still
> David.Komanek> tries to work on it. But why should thi be our case ?
>
> See me other message -- sendmail forks a child to work on it and then
> the parent closes the file descriptor. On a fcntl() based system, that
> means the child loses the lock as well. That is why sendmail has
> different code for flock() and fcntl() locking systems. Someone, this
> machine was misconfigured to use the flock() code.
>
> David.Komanek> If I understand this, the process which want's to read
> David.Komanek> from the locked file waith in the sleep mode until
> David.Komanek> it timeouts or kernel wakes it up after the lock was
> David.Komanek> released. What happens between the time kernel sends
> David.Komanek> the signal for waiting process and the time another new
> David.Komanek> process tries to lock the file ? I hope, this is matter
> David.Komanek> of kernel and it can be considered to be o.k., can't be ?
>
> I don't think there is an issue to worry about here.
>
> David.Komanek> Yes, I would like to consider the use of this type of
> David.Komanek> locking for non-standard way of coding and may tell
> David.Komanek> sendmail and procmail not to use it at all. But still,
> David.Komanek> if this method is so commonly used, what shoul I expect
> David.Komanek> prom various mail, pop3 and imap clients operating on the
> David.Komanek> mailboxes, probably still using flock() ? Is it safe to
> David.Komanek> tell sendmail not to use flock() and expect mail clients
> David.Komanek> do ?
>
> Since flock() is implemented as fcntl(), it wouldn't be a problem if other
> programs used flock() while sendmail used fcntl() since they result in
> the same underlying mechanism. However, sendmail's method of forking
> and closing file descriptors presents a problem for flock()'s that
> don't follow the expected flock() behavior. That is why sendmail acts
> differently for fcntl() locking systems.
>
> ------------------------------------------------------------------------
>
> Subject: Re: trouble with HOST
> Date: Mon, 22 Apr 2002 14:09:31 -0700
> From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
> To: Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>
> CC: sendmail-questions_at_sendmail.org,
> David Komanek <David.Komanek_at_natur.cuni.cz>,
> sendmail+rickert_at_sendmail.org
> References: <17446.1019244292_at_euclid.cs.niu.edu> <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz>
>
> rickert> Your sendmail was compiled with HASFLOCK . It looks as if that
> rickert> is a mistake.
>
> I agree with Neil, this is the cause of the problem.
>
> mmokrejs> Hmm, but there was already configured:
> mmokrejs>
> mmokrejs> # queue up everything before forking?
> mmokrejs> O SuperSafe=True
>
> Yes, the item is queued before forking, but due to the broken locking
> choice, the lock on the queued file was dropped so both the sendmail
> which queued the job and a queue runner tried to operate on the queued
> job at the same time.
>
> mmokrejs> I hope DeliveryMode was only renamed to SuperSafe. ;(
>
> DeliveryMode was not renamed.
>
> mmokrejs> I'm curious why procmail, when being compiled here, resolves
> mmokrejs> flock() as fully operational. Before compiling the real
> mmokrejs> binary, there're tested several ways of locking, and as you
> mmokrejs> see, it found flock() quite working:
> mmokrejs>
> mmokrejs> Locking strategies: dotlocking, fcntl(), lockf(), flock()
>
> Since it uses all of the methods, it is safe. If it only used flock()
> and it forked a child to do the work and the parent closed the file, then
> it would have the same locking issue sendmail has.
>
> ------------------------------------------------------------------------
>
> Subject: Re: trouble with HOST
> Date: Tue, 23 Apr 2002 07:33:24 -0500
> From: Neil W Rickert <sendmail+rickert_at_sendmail.org>
> Reply-To: sendmail-questions_at_sendmail.org
> To: Martin MOKREJŠ <mmokrejs_at_natur.cuni.cz>
> CC: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
> sendmail-questions_at_sendmail.org,
> David Komanek <David.Komanek_at_natur.cuni.cz>,
> sendmail+rickert_at_sendmail.org
> References: <20020422210931.GI9539_at_scooter.smi.sendmail.com> <Pine.OSF.4.21.0204231038000.41843-100000_at_tao.natur.cuni.cz>
>
> <mmokrejs_at_natur.cuni.cz> wrote:
>
> >On Mon, 22 Apr 2002, Gregory Neil Shapiro wrote:
>
> >Hi,
>
> >Gregory> mmokrejs> # queue up everything before forking?
> >Gregory> mmokrejs> O SuperSafe=True
> >Gregory>
> >Gregory> Yes, the item is queued before forking, but due to the broken locking
> >Gregory> choice, the lock on the queued file was dropped so both the sendmail
> >Gregory> which queued the job and a queue runner tried to operate on the queued
> >Gregory> job at the same time.
> >Gregory>
> >Gregory> mmokrejs> I hope DeliveryMode was only renamed to SuperSafe. ;(
> >Gregory>
> >Gregory> DeliveryMode was not renamed.
>
> >Sorrym so what;s the difeerence between SuperSafe and DeliveryMode?
>
> SuperSafe: Make sure that the message is safely written to the queue
> before acknowledging to the sending client.
>
> DeliveryMode: could be "interactive" or "background" or "queue"
>
> queue Don't attempt to deliver the mail after it is
> received, but only queue. The next queue run will
> attempt delivery.
>
> background (the default) fork() a child process to do delivery
> in the background
>
> interactive deliver immediately, and don't acknowledge
> acceptance to the connecting client until the
> completion of the first attempt to deliver.
>
> Note that interactive delivery mode is not useful for SMTP, but is
> sometime useful for command line mail.
>
> >Gregory> mmokrejs> I'm curious why procmail, when being compiled here, resolves
> >Gregory> mmokrejs> flock() as fully operational. Before compiling the real
> >Gregory> mmokrejs> binary, there're tested several ways of locking, and as you
> >Gregory> mmokrejs> see, it found flock() quite working:
> >Gregory> mmokrejs>
> >Gregory> mmokrejs> Locking strategies: dotlocking, fcntl(), lockf(), flock()
> >Gregory>
> >Gregory> Since it uses all of the methods, it is safe. If it only used flock()
> >Gregory> and it forked a child to do the work and the parent closed the file, then
> >Gregory> it would have the same locking issue sendmail has.
>
> >Yes, but in that case, why the testing program test locking at all before
> >compiling, why the binary doesn't try to use all kinds of lcoks which
> >exist in the world? That would be the safest way. ;)
>
> procmail and other delivery agents do not attempt to fork() while
> holding a lock. They may fork() before attempting delivery, then
> the child from fork() locks and delivers.
>
> procmail and other delivery agents have to protect a user mailbox
> against concurrent access by other programs, such as an MUA or
> a POP3 or IMAP daemon.
>
> By contrast, sendmail is only locking its own queue files, and only
> for protection against other instantiations of sendmail. It's a
> different locking problem.
>
> -NWR
>
> ------------------------------------------------------------------------
>
> Subject: Re: trouble with HOST
> Date: Tue, 23 Apr 2002 12:15:33 -0500
> From: Neil W Rickert <sendmail+rickert_at_sendmail.org>
> Reply-To: sendmail-questions_at_sendmail.org
> To: David Komanek <David.Komanek_at_natur.cuni.cz>
> CC: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
> Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>,
> sendmail-questions_at_sendmail.org, sendmail+rickert_at_sendmail.org
> References: <17446.1019244292_at_euclid.cs.niu.edu> <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz> <20020422210931.GI9539_at_scooter.smi.sendmail.com> <3CC4F4E7.1020700_at_natur.cuni.cz>
>
> David Komanek <David.Komanek_at_natur.cuni.cz> wrote:
>
> >to get the sendmail without flock() support. This is in the section
> >related to __osf__ macro and there is mentioned "tested for 3.2 and
> >4.0". Apparently, the manpage on Tru64Unix 4.0d box has the same info
> >about flock() limitations as 5.1a version has. So I suggest to change
> >the default state to "not to use flock" on Tru64Unix boxes for future
> >releases of sendmail.
>
> Hopefully Claus or Greg will make those changes in the sources.
>
> Your man pages are a little non-specific about what flock() does.
> Judging from your problems, it seems that your latest version has
> changed the support for flock() from bsd semantics to emulation based
> on fcntl. Apparently the man pages were not properly updated to show
> the change.
>
> >It is strange, because also in RedHat 7.2 manpages is the same
> >limitation described:
>
> As far as I know, flock() works fine on linux, and on bsd.
>
> >The same for Irix 6.5.14:
>
> But it is unsatisfactory on Irix and on Solaris.
>
> > BUGS
> > Unlike BSD, child processes created by fork(2) do not inherit
> >references
> > to locks acquired by their parents through flock(3B) calls. This bug
> > results from flock's implementation atop System V file and record
> >locks.
>
> That's the problem.
>
> >So, I wonder if there is some platform, where the proper function of
> >flock() on duplicated filehandles is expected. Probably I don't
> >understand this well, but it seems flock() is not designed for the type
> >of use sendmail needs. Or the manpages are obsolete and on some
> >platforms the implemetation of flock() is better than defined in
> >manpages ? Yes, now I should write some tests to get an answer to this
> >question, but I am not sure if I am able to do it well :-)
>
> I think your man pages are obsolete. They seem to imply the correct
> semantics, although they don't quite say so. The Irix
> man pages correctly identify the problem with flock() on that
> system.
>
> Sendmail has two alternative strategies:
>
> If flock() works properly -
>
> fork a child to do the delivery. The child retains the lock,
> and the parent closes the queue file without an explicit unlock.
>
> If flock() does not work properly -
>
> queue the message for a future queue run. Then start an
> explicit queue run for that particular message. There is a
> possibility that a regular queue run will get to the message
> before this explicit queue run, and in that case the explicit
> queue run silently exits.
>
> This strategy is more costly. The address parsing that was
> done when receiving the mail must be repeated in the explicit
> queue runner. Where flock works properly, we prefer to save
> those additional costs.
>
> I hope that clarifies the situation.
>
> -NWR
>
> ------------------------------------------------------------------------
>
> Subject: Re: trouble with HOST
> Date: Tue, 23 Apr 2002 10:34:48 -0700
> From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
> To: David Komanek <David.Komanek_at_natur.cuni.cz>
> CC: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
> Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>,
> sendmail-questions_at_sendmail.org, sendmail+rickert_at_sendmail.org
> References: <17446.1019244292_at_euclid.cs.niu.edu> <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz> <20020422210931.GI9539_at_scooter.smi.sendmail.com> <3CC4F4E7.1020700_at_natur.cuni.cz>
>
> The man page for flock() on Digital UNIX 4.0G Rev 1530 states:
>
> NOTES
>
> Locks are on files, not file descriptors. That is, file descriptors dupli-
> cated using the dup() or fork() functions do not result in multiple
> instances of a lock, but rather multiple references to a single lock. If a
> process holding a lock on a file forks and the child explicitly unlocks the
> file, the parent will lose its lock.
>
> This is the behavior we want from flock. The behavior we *don't* want
> is for the lock to be lost if one process closes a file descriptor. This
> isn't the case here since locks are on files, not descriptors as the
> first sentence states.
>
> Processes blocked awaiting a lock may be awakened by signals.
>
> The file locks set by the flock()function do not interact in any way with
> the file locks set by the fcntl() and lockf() functions. If a process sets
> an exclusive lock on a file using the flock() function, the lock will not
> affect any process that is setting or clearing locks on the same file using
> the fcntl() or lockf() functions. It is therefore possible for an incon-
> sistency to arise if a file is locked by different processes using flock()
> and fcntl(). (The fcntl() and lockf() functions use the same mechanism for
> record locking.)
>
> That also proves that flock() on 4.0G is *not* based on fcntl() since
> the locks are distinct.
>
> flock() is the right choice for Digital UNIX 4.X. Now I have to find a 5.X
> system to see if they changed things on us.
>
> ------------------------------------------------------------------------
>
> Subject: Re: trouble with HOST
> Date: Tue, 23 Apr 2002 10:39:55 -0700
> From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
> To: David Komanek <David.Komanek_at_natur.cuni.cz>
> CC: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
> Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>,
> sendmail-questions_at_sendmail.org, sendmail+rickert_at_sendmail.org
> References: <17446.1019244292_at_euclid.cs.niu.edu> <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz> <20020422210931.GI9539_at_scooter.smi.sendmail.com> <3CC4F4E7.1020700_at_natur.cuni.cz>
>
> Interesting.. On 4.0, the man page says:
>
> The flock() function operates on the local system only. It does not make
> any attempt to coordinate a file's lock status with other systems. In a
> distributed environment, use the fcntl() or lockf() interfaces to place
> advisory locks on files, as they provide a superset of flock() features.
>
> On 5.0 the man page says:
>
> You can use the flock() function to coordinate a file's lock status on
> local, CFS, and NFS file systems.
>
> The NOTES section also changed only part of the section (leaving the
> "Locks are on files, not file descriptors" but changing the last
> paragraph to:
>
> The flock() interface is not part of any UNIX standard. Therefore, if you
> are designing and writing applications to be portable across platforms, you
> should use the fcntl() file locking interface instead of flock().
>
> I think they left the first NOTES paragraph behind in error as the locks
> are on the file descriptor. So we will need some magic in conf.h to
> disable flock for 5.X but leave it in place for 4.X and earlier.
>
> I'm actually saddened to see that DEC has chosen to go with a broken
> (IMHO) lock implementation.
>
> ------------------------------------------------------------------------
>
> Subject: Re: trouble with HOST
> Date: Tue, 23 Apr 2002 11:12:19 -0700
> From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
> To: David Komanek <David.Komanek_at_natur.cuni.cz>
> CC: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
> Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>,
> sendmail-questions_at_sendmail.org, sendmail+rickert_at_sendmail.org
> References: <17446.1019244292_at_euclid.cs.niu.edu> <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz> <20020422210931.GI9539_at_scooter.smi.sendmail.com> <3CC4F4E7.1020700_at_natur.cuni.cz>
>
> Can you back out your change to conf.h and try this patch instead? Make
> sure HASFLOCK is not shown in the debug output on a 5.X machine but it
> is on a 4.X machine (if you have access to one).
>
> Index: conf.h
> ===================================================================
> RCS file: /cvs/include/sm/conf.h,v
> retrieving revision 1.87
> diff -u -u -r1.87 conf.h
> --- conf.h 2002/04/02 08:11:52 1.87
> +++ conf.h 2002/04/23 18:06:21
> _at__at_ -612,7 +612,12 _at__at_
> # define GIDSET_T gid_t
> # define SM_INT32 int /* 32bit integer */
> # ifndef HASFLOCK
> -# define HASFLOCK 1 /* has flock(2) call */
> +# include <standards.h>
> +# if _XOPEN_SOURCE+0 >= 400
> +# define HASFLOCK 0 /* 5.0 and later has bad flock(2) call */
> +# else /* _XOPEN_SOURCE+0 >= 400 */
> +# define HASFLOCK 1 /* has flock(2) call */
> +# endif /* _XOPEN_SOURCE+0 >= 400 */
> # endif /* ! HASFLOCK */
> # define LA_TYPE LA_ALPHAOSF
> # define SFS_TYPE SFS_STATVFS /* use <sys/statvfs.h> statfs() impl */


---------- Forwarded message ----------
From: Martin MOKREJŠ <mmokrejs_at_natur.cuni.cz>
To: readers_comment_at_zk3.dec.com
Cc: tru64-unix-managers_at_ornl.gov
Date: Wed, 24 Apr 2002 11:58:46 +0200 (CEST)
Subject: broken flock on Tru65 5.X

Hi,
  it seems there's changed flock() behaviour on Tru64 since 5.X. As it
turned out when inspecting problems with sendmail, the manpages are
possibly outdated, stating false description and the current
implemetation
does not work anymore in the good old way as it did in 4.0 systems.

  I tried to find some information at compag official website, search
through offcial docs, looked into Technical updates for OS releases, but
nowhere I've found a list of technical details. Can you please tell me
where to find such information?

  Can you please forward this mail to the developers so that I could a
reply from them? I'm CC'ing this mail to the tru64-unix-managers e-mail
group as I hope someone will hopefully forward this to the right person
(thanks!). The links to "Contact us" and "Suport - Contact" are pretty
useless on your web, sorry!

TIA
-- 
Martin Mokrejs <mmokrejs_at_natur.cuni.cz>
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics <http://mips.gsf.de>
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3616 , fax: +49-89-3187 3585



Content-id: <Pine.OSF.4.21.0204241147240.278821_at_tao.natur.cuni.cz>
MIME-version: 1.0
Content-type: message/rfc822; CHARSET=US-ASCII
Content-description: Re: trouble with HOST (fwd)

Return-path: <sendmail+rickert_at_sendmail.org>
Received: from euclid.cs.niu.edu (root_at_euclid.cs.niu.edu [131.156.145.14])
        by natur.cuni.cz (a.b.c/a.b.c) with ESMTP id g3JJOwVD358782; Fri,
 19 Apr 2002 21:24:59 +0200 (MDT)
Received: from localhost (rickert_at_localhost [127.0.0.1])
        by euclid.cs.niu.edu (8.12.3/8.12.3) with ESMTP id g3JJOqgv017449; Fri,
 19 Apr 2002 14:24:52 -0500 (CDT)
Date: Fri, 19 Apr 2002 14:24:52 -0500
From: Neil W Rickert <sendmail+rickert_at_sendmail.org>
Subject: Re: trouble with HOST
In-reply-to: Message from David Komanek <David.Komanek_at_natur.cuni.cz>
 "of Fri, 19 Apr 2002 08:09:39 +0200." <3CBFB4A3.6070705_at_natur.cuni.cz>
To: David Komanek <David.Komanek_at_natur.cuni.cz>
Cc: sendmail-questions_at_sendmail.org,
 Martin =?ISO-8859-2?Q?MOKREJ=A9?= <mmokrejs_at_natur.cuni.cz>,
 sendmail+rickert_at_sendmail.org
Reply-to: sendmail-questions_at_sendmail.org
Message-id: <17446.1019244292_at_euclid.cs.niu.edu>
MIME-version: 1.0
X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
X-Obalka-From: sendmail+rickert_at_sendmail.org
References: <3CBE6478.8030704_at_natur.cuni.cz>
 <Pine.OSF.4.21.0204181020410.32059-100000_at_tao.natur.cuni.cz>
 <13999.1019160590_at_euclid.cs.niu.edu> <3CBFB4A3.6070705_at_natur.cuni.cz>


David Komanek <David.Komanek_at_natur.cuni.cz> wrote:

>>I would like to also see the output from

>> grep g3GBg1wG138777 debug.log

>>If you have rotated logs since then, substitute the appropriate
>>name.

>I have logs 14 days ago online, older are on the backups, so there is no
>problem to get the information. Here is it. Quite interresting, but I
>don't think I understand it well :-)

Here is my assessment. I could be mistaken. Hopefully Greg and
Claus (at sendmail.org) will also be reviewing this.

Your sendmail was compiled with HASFLOCK . It looks as if that
is a mistake.

I am mainly using solaris, with a little linux on the side. There is
an flock() library function for solaris. But it is emulated using
FCNTL/LOCKF, which gives an incomplete emulation. Sendmail is compiled
without HASFLOCK on solaris.

Your system is behaving as I would expect solaris to behave if I made
the mistake of compiling with HASFLOCK.

I'm not sure what is the simplest way to change your system to not
use FLOCK. Perhaps Greg and/or Claus will suggest something.

Here is an excerpt from your logs, with comments.

>Apr 16 13:42:03 prfdec sendmail[138777]: NOQUEUE: --- 250 2.0.0
>g3GBg1wG138777 Message accepted for
>delivery

That's the log showing that the message had been received.

At this stage, sendmail would normally fork(), and the child would
attempt to deliver.

>Apr 16 13:42:03 prfdec sendmail[138935]: g3GBg1wG138777: --- 050
><tomas_at_bodye.cz>... Connecting to m
>ail.bodye.cz. via esmtp...

Here we have a process picking up the message for sending.
>Apr 16 13:42:03 prfdec sendmail[138935]: g3GBg1wG138777: SMTP outgoing
>connect on tao-eth.natur.cuni
>.cz

That's the same process.

>Apr 16 13:42:16 prfdec sendmail[138883]: g3GBg1wG138777: SMTP outgoing
>connect on tao-eth.natur.cuni
>.cz

Here we have a second process picking up the message for
sending. Now both 138935 and 138883 are working on the same
message. That is not supposed to happen.

I'm guessing, but I think 138935 is the forked child of the receiving
process, while 138883 is a queue runner.

flock() is supposed to hold the lock through fork(). So 138935, as
the child, assumes that it has the queue file locked. However if
flock is emulated (badly) with fcntl, then the lock is lost during
the fork(). Thus 138935 works on the assumption that it has this
queue file locked. But the lock was actually lost, and that allowed
138883 to pick up this message.

>Apr 16 13:44:15 prfdec sendmail[140157]: g3GBg1wG138777: locked

There are several of these. They show that locking does work,
at least in some form, on your system. It is consistent with
my diagnosis above.

>Apr 16 13:51:17 prfdec sendmail[138935]: g3GBg1wG138777: --- 050
><tomas_at_bodye.cz>... Sent (Data rece
>ived OK.)

Now the message has been sent by 138935, which deletes the queue
file (both qfg3GBg1wG138777 and dfg3GBg1wG138777).

>Apr 16 13:51:17 prfdec sendmail[138935]: g3GBg1wG138777:
>to=<tomas_at_bodye.cz>, ctladdr=<uamvt_at_natur.c
>uni.cz> (415/15), delay=00:09:16, xdelay=00:09:14, mailer=esmtp,
>pri=30415, relay=mail.bodye.cz. [21
>2.71.156.38], dsn=2.0.0, stat=Sent (Data received OK.)
>Apr 16 13:51:17 prfdec sendmail[138935]: g3GBg1wG138777: done;
>delay=00:09:16, ntries=1

And it logs the successful sending.

>Apr 16 13:52:48 prfdec sendmail[138883]: g3GBg1wG138777:
>to=<tomas_at_bodye.cz>, ctladdr=<uamvt_at_natur.c
>uni.cz> (415/15), delay=00:10:47, xdelay=00:10:33, mailer=esmtp,
>pri=120415, relay=mail.bodye.cz. [2
>12.71.156.38], dsn=4.0.0, stat=I/O error

I'm not sure what the I/O error was there. It may be unrelated. In
any case, this results in a temp failure. Process 138883 rewrites
the qfg3GBg1wG138777 file from its buffered information. From there,
later processes can again pick it up but will run into the missing
dfg3GBg1wG138777 file.

  ------------

Here is a temporary work around.

Configure DeliveryMode=queue . You seem to be running frequent
queue runs, so this will not delay the message for too long.
With that change, there is no longer any dependence on a lock
being inherited by a child after fork().

Let us know whether that helps.

Also, send us your man pages for flock() -- maybe that will tell us
whether my diagnosis is correct.

 -NWR



Content-id: <Pine.OSF.4.21.0204241147241.278821_at_tao.natur.cuni.cz>
Content-type: message/rfc822; CHARSET=US-ASCII
Content-description: Re: trouble with HOST (fwd)

Return-path: <gshapiro_at_sendmail.com>
Received: from spork.sendmail.com (spork.Sendmail.COM [209.246.26.39])
        by natur.cuni.cz (a.b.c/a.b.c) with ESMTP id g3MLEgek462538; Mon,
 22 Apr 2002 23:14:43 +0200 (MDT)
Received: from foon.sendmail.com (smtp.sendmail.com [209.246.26.40])
        by spork.sendmail.com (Switch-2.2.2/Switch-2.2.0) with ESMTP id g3MLEks14855
        (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK)
 ; Mon, 22 Apr 2002 14:14:47 -0700 (PDT)
Received: from scooter.smi.sendmail.com (natted.Sendmail.COM [63.211.143.38])
        by foon.sendmail.com (Switch-2.2.2/Switch-2.2.0) with ESMTP id g3MLEd931593
        (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified FAIL)
 ; Mon, 22 Apr 2002 14:14:39 -0700
Received: from scooter.smi.sendmail.com (localhost [IPv6:::1])
        by scooter.smi.sendmail.com (8.12.2/8.12.2) with ESMTP id g3MLEcAN012859
        (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Mon,
 22 Apr 2002 14:14:38 -0700
Received: (from gshapiro_at_localhost) by scooter.smi.sendmail.com
 (8.12.2/8.12.2/Submit) id g3MLEcSa012858; Mon, 22 Apr 2002 14:14:38 -0700 (PDT)
Date: Mon, 22 Apr 2002 14:14:38 -0700
From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
Subject: Re: trouble with HOST
In-reply-to: <3CC3FB3D.7070906_at_natur.cuni.cz>
To: David Komanek <David.Komanek_at_natur.cuni.cz>
Cc: Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>, sendmail-questions_at_sendmail.org,
 sendmail+rickert_at_sendmail.org
Message-id: <20020422211438.GJ9539_at_scooter.smi.sendmail.com>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: inline
User-Agent: Mutt/1.3.28i
X-Obalka-From: gshapiro_at_sendmail.com
X-Filtered: Sendmail MIME Filter v1.0.8 foon.sendmail.com g3MLEd931593
References: <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz>
 <3CC3FB3D.7070906_at_natur.cuni.cz>


David.Komanek> I changed the DelivweryMode to advised "queue" instead
David.Komanek> of "background" in /etc/mail/sendmail.cf just now. I
David.Komanek> thought, SuperSafe=True tells sendmail to enqueue every
David.Komanek> message. If not, I am not sure what SuperSafe does, but
David.Komanek> it is another thread, probably not related to sendmail
David.Komanek> locking strategies.

It does, and sendmail is queuing it up (otherwise, the second queue
running process wouldn't be finding it). The problem is the lock is
dropped while the first sendmail is still operating on it.

David.Komanek> Well, this problem concerns only situations in which the
David.Komanek> child unlocks the file while the parent process still
David.Komanek> tries to work on it. But why should thi be our case ?

See me other message -- sendmail forks a child to work on it and then
the parent closes the file descriptor. On a fcntl() based system, that
means the child loses the lock as well. That is why sendmail has
different code for flock() and fcntl() locking systems. Someone, this
machine was misconfigured to use the flock() code.

David.Komanek> If I understand this, the process which want's to read
David.Komanek> from the locked file waith in the sleep mode until
David.Komanek> it timeouts or kernel wakes it up after the lock was
David.Komanek> released. What happens between the time kernel sends
David.Komanek> the signal for waiting process and the time another new
David.Komanek> process tries to lock the file ? I hope, this is matter
David.Komanek> of kernel and it can be considered to be o.k., can't be ?

I don't think there is an issue to worry about here.

David.Komanek> Yes, I would like to consider the use of this type of
David.Komanek> locking for non-standard way of coding and may tell
David.Komanek> sendmail and procmail not to use it at all. But still,
David.Komanek> if this method is so commonly used, what shoul I expect
David.Komanek> prom various mail, pop3 and imap clients operating on the
David.Komanek> mailboxes, probably still using flock() ? Is it safe to
David.Komanek> tell sendmail not to use flock() and expect mail clients
David.Komanek> do ?

Since flock() is implemented as fcntl(), it wouldn't be a problem if other
programs used flock() while sendmail used fcntl() since they result in
the same underlying mechanism. However, sendmail's method of forking
and closing file descriptors presents a problem for flock()'s that
don't follow the expected flock() behavior. That is why sendmail acts
differently for fcntl() locking systems.


Content-id: <Pine.OSF.4.21.0204241147242.278821_at_tao.natur.cuni.cz>
Content-type: message/rfc822; CHARSET=US-ASCII
Content-description: Re: trouble with HOST (fwd)

Return-path: <gshapiro_at_sendmail.com>
Received: from spork.sendmail.com (spork.Sendmail.COM [209.246.26.39])
        by natur.cuni.cz (a.b.c/a.b.c) with ESMTP id g3ML9fek462048; Mon,
 22 Apr 2002 23:09:46 +0200 (MDT)
Received: from foon.sendmail.com (smtp.sendmail.com [209.246.26.40])
        by spork.sendmail.com (Switch-2.2.2/Switch-2.2.0) with ESMTP id g3ML9es14332
        (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK)
 ; Mon, 22 Apr 2002 14:09:41 -0700 (PDT)
Received: from scooter.smi.sendmail.com (natted.Sendmail.COM [63.211.143.38])
        by foon.sendmail.com (Switch-2.2.2/Switch-2.2.0) with ESMTP id g3ML9W930798
        (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified FAIL)
 ; Mon, 22 Apr 2002 14:09:33 -0700
Received: from scooter.smi.sendmail.com (localhost [IPv6:::1])
        by scooter.smi.sendmail.com (8.12.2/8.12.2) with ESMTP id g3ML9WAN012830
        (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Mon,
 22 Apr 2002 14:09:32 -0700
Received: (from gshapiro_at_localhost) by scooter.smi.sendmail.com
 (8.12.2/8.12.2/Submit) id g3ML9V4j012829; Mon, 22 Apr 2002 14:09:31 -0700 (PDT)
Date: Mon, 22 Apr 2002 14:09:31 -0700
From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
Subject: Re: trouble with HOST
In-reply-to: <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz>
To: Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>
Cc: sendmail-questions_at_sendmail.org,
 David Komanek <David.Komanek_at_natur.cuni.cz>, sendmail+rickert_at_sendmail.org
Message-id: <20020422210931.GI9539_at_scooter.smi.sendmail.com>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: inline
User-Agent: Mutt/1.3.28i
X-Obalka-From: gshapiro_at_sendmail.com
X-Filtered: Sendmail MIME Filter v1.0.8 foon.sendmail.com g3ML9W930798
References: <17446.1019244292_at_euclid.cs.niu.edu>
 <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz>


rickert> Your sendmail was compiled with HASFLOCK . It looks as if that
rickert> is a mistake.

I agree with Neil, this is the cause of the problem.

mmokrejs> Hmm, but there was already configured:
mmokrejs>
mmokrejs> # queue up everything before forking?
mmokrejs> O SuperSafe=True

Yes, the item is queued before forking, but due to the broken locking
choice, the lock on the queued file was dropped so both the sendmail
which queued the job and a queue runner tried to operate on the queued
job at the same time.

mmokrejs> I hope DeliveryMode was only renamed to SuperSafe. ;(

DeliveryMode was not renamed.

mmokrejs> I'm curious why procmail, when being compiled here, resolves
mmokrejs> flock() as fully operational. Before compiling the real
mmokrejs> binary, there're tested several ways of locking, and as you
mmokrejs> see, it found flock() quite working:
mmokrejs>
mmokrejs> Locking strategies: dotlocking, fcntl(), lockf(), flock()

Since it uses all of the methods, it is safe. If it only used flock()
and it forked a child to do the work and the parent closed the file, then
it would have the same locking issue sendmail has.


Content-id: <Pine.OSF.4.21.0204241147243.278821_at_tao.natur.cuni.cz>
Content-type: message/rfc822; CHARSET=US-ASCII
Content-description: Re: trouble with HOST (fwd)

Return-path: <sendmail+rickert_at_sendmail.org>
Received: from euclid.cs.niu.edu (root_at_euclid.cs.niu.edu [131.156.145.14])
        by natur.cuni.cz (a.b.c/a.b.c) with ESMTP id g3NCXT6r088576; Tue,
 23 Apr 2002 14:33:30 +0200 (MDT)
Received: from localhost (rickert_at_localhost [127.0.0.1])
        by euclid.cs.niu.edu (8.12.3/8.12.3) with ESMTP id g3NCXOgv029564; Tue,
 23 Apr 2002 07:33:24 -0500 (CDT)
Date: Tue, 23 Apr 2002 07:33:24 -0500
From: Neil W Rickert <sendmail+rickert_at_sendmail.org>
Subject: Re: trouble with HOST
In-reply-to: Message from =?iso-8859-2?Q?Martin_MOKREJ=A9?=
 <mmokrejs_at_natur.cuni.cz> "of Tue, 23 Apr 2002 10:41:31 +0200."
 <Pine.OSF.4.21.0204231038000.41843-100000_at_tao.natur.cuni.cz>
To: =?iso-8859-2?Q?Martin_MOKREJ=A9?= <mmokrejs_at_natur.cuni.cz>
Cc: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
 sendmail-questions_at_sendmail.org, David Komanek <David.Komanek_at_natur.cuni.cz>,
 sendmail+rickert_at_sendmail.org
Reply-to: sendmail-questions_at_sendmail.org
Message-id: <29561.1019565204_at_euclid.cs.niu.edu>
MIME-version: 1.0
X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
X-Obalka-From: sendmail+rickert_at_sendmail.org
References: <20020422210931.GI9539_at_scooter.smi.sendmail.com>
 <Pine.OSF.4.21.0204231038000.41843-100000_at_tao.natur.cuni.cz>


<mmokrejs_at_natur.cuni.cz> wrote:

>On Mon, 22 Apr 2002, Gregory Neil Shapiro wrote:

>Hi,

>Gregory> mmokrejs> # queue up everything before forking?
>Gregory> mmokrejs> O SuperSafe=True
>Gregory>
>Gregory> Yes, the item is queued before forking, but due to the broken locking
>Gregory> choice, the lock on the queued file was dropped so both the sendmail
>Gregory> which queued the job and a queue runner tried to operate on the queued
>Gregory> job at the same time.
>Gregory>
>Gregory> mmokrejs> I hope DeliveryMode was only renamed to SuperSafe. ;(
>Gregory>
>Gregory> DeliveryMode was not renamed.

>Sorrym so what;s the difeerence between SuperSafe and DeliveryMode?

SuperSafe: Make sure that the message is safely written to the queue
            before acknowledging to the sending client.

DeliveryMode: could be "interactive" or "background" or "queue"

        queue Don't attempt to deliver the mail after it is
                received, but only queue. The next queue run will
                attempt delivery.
        
        background (the default) fork() a child process to do delivery
                in the background
        
        interactive deliver immediately, and don't acknowledge
                acceptance to the connecting client until the
                completion of the first attempt to deliver.

Note that interactive delivery mode is not useful for SMTP, but is
sometime useful for command line mail.

>Gregory> mmokrejs> I'm curious why procmail, when being compiled here, resolves
>Gregory> mmokrejs> flock() as fully operational. Before compiling the real
>Gregory> mmokrejs> binary, there're tested several ways of locking, and as you
>Gregory> mmokrejs> see, it found flock() quite working:
>Gregory> mmokrejs>
>Gregory> mmokrejs> Locking strategies: dotlocking, fcntl(), lockf(), flock()
>Gregory>
>Gregory> Since it uses all of the methods, it is safe. If it only used flock()
>Gregory> and it forked a child to do the work and the parent closed the file, then
>Gregory> it would have the same locking issue sendmail has.

>Yes, but in that case, why the testing program test locking at all before
>compiling, why the binary doesn't try to use all kinds of lcoks which
>exist in the world? That would be the safest way. ;)

procmail and other delivery agents do not attempt to fork() while
holding a lock. They may fork() before attempting delivery, then
the child from fork() locks and delivers.

procmail and other delivery agents have to protect a user mailbox
against concurrent access by other programs, such as an MUA or
a POP3 or IMAP daemon.

By contrast, sendmail is only locking its own queue files, and only
for protection against other instantiations of sendmail. It's a
different locking problem.

 -NWR



Content-id: <Pine.OSF.4.21.0204241147244.278821_at_tao.natur.cuni.cz>
Content-type: message/rfc822; CHARSET=US-ASCII
Content-description: Re: trouble with HOST (fwd)

Return-path: <sendmail+rickert_at_sendmail.org>
Received: from euclid.cs.niu.edu (root_at_euclid.cs.niu.edu [131.156.145.14])
        by natur.cuni.cz (a.b.c/a.b.c) with ESMTP id g3NHFd6r128144; Tue,
 23 Apr 2002 19:15:44 +0200 (MDT)
Received: from localhost (rickert_at_localhost [127.0.0.1])
        by euclid.cs.niu.edu (8.12.3/8.12.3) with ESMTP id g3NHFXgv000459; Tue,
 23 Apr 2002 12:15:33 -0500 (CDT)
Date: Tue, 23 Apr 2002 12:15:33 -0500
From: Neil W Rickert <sendmail+rickert_at_sendmail.org>
Subject: Re: trouble with HOST
In-reply-to: Message from David Komanek <David.Komanek_at_natur.cuni.cz>
 "of Tue, 23 Apr 2002 07:45:11 +0200." <3CC4F4E7.1020700_at_natur.cuni.cz>
To: David Komanek <David.Komanek_at_natur.cuni.cz>
Cc: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
 Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>, sendmail-questions_at_sendmail.org,
 sendmail+rickert_at_sendmail.org
Reply-to: sendmail-questions_at_sendmail.org
Message-id: <456.1019582133_at_euclid.cs.niu.edu>
MIME-version: 1.0
X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
X-Obalka-From: sendmail+rickert_at_sendmail.org
References: <17446.1019244292_at_euclid.cs.niu.edu>
 <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz>
 <20020422210931.GI9539_at_scooter.smi.sendmail.com>
 <3CC4F4E7.1020700_at_natur.cuni.cz>


David Komanek <David.Komanek_at_natur.cuni.cz> wrote:

>to get the sendmail without flock() support. This is in the section
>related to __osf__ macro and there is mentioned "tested for 3.2 and
>4.0". Apparently, the manpage on Tru64Unix 4.0d box has the same info
>about flock() limitations as 5.1a version has. So I suggest to change
>the default state to "not to use flock" on Tru64Unix boxes for future
>releases of sendmail.

Hopefully Claus or Greg will make those changes in the sources.

Your man pages are a little non-specific about what flock() does.
Judging from your problems, it seems that your latest version has
changed the support for flock() from bsd semantics to emulation based
on fcntl. Apparently the man pages were not properly updated to show
the change.

>It is strange, because also in RedHat 7.2 manpages is the same
>limitation described:

As far as I know, flock() works fine on linux, and on bsd.

>The same for Irix 6.5.14:

But it is unsatisfactory on Irix and on Solaris.

> BUGS
> Unlike BSD, child processes created by fork(2) do not inherit
>references
> to locks acquired by their parents through flock(3B) calls. This bug
> results from flock's implementation atop System V file and record
>locks.

That's the problem.

>So, I wonder if there is some platform, where the proper function of
>flock() on duplicated filehandles is expected. Probably I don't
>understand this well, but it seems flock() is not designed for the type
>of use sendmail needs. Or the manpages are obsolete and on some
>platforms the implemetation of flock() is better than defined in
>manpages ? Yes, now I should write some tests to get an answer to this
>question, but I am not sure if I am able to do it well :-)

I think your man pages are obsolete. They seem to imply the correct
semantics, although they don't quite say so. The Irix
man pages correctly identify the problem with flock() on that
system.

Sendmail has two alternative strategies:

  If flock() works properly -

        fork a child to do the delivery. The child retains the lock,
        and the parent closes the queue file without an explicit unlock.

  If flock() does not work properly -

        queue the message for a future queue run. Then start an
        explicit queue run for that particular message. There is a
        possibility that a regular queue run will get to the message
        before this explicit queue run, and in that case the explicit
        queue run silently exits.

        This strategy is more costly. The address parsing that was
        done when receiving the mail must be repeated in the explicit
        queue runner. Where flock works properly, we prefer to save
        those additional costs.

I hope that clarifies the situation.

 -NWR



Content-id: <Pine.OSF.4.21.0204241147245.278821_at_tao.natur.cuni.cz>
Content-type: message/rfc822; CHARSET=US-ASCII
Content-description: Re: trouble with HOST (fwd)

Return-path: <gshapiro_at_gshapiro.net>
Received: from horsey.gshapiro.net (root_at_horsey.gshapiro.net [209.220.147.178])
        by natur.cuni.cz (a.b.c/a.b.c) with ESMTP id g3NHYo6r131024; Tue,
 23 Apr 2002 19:34:51 +0200 (MDT)
Received: from horsey.gshapiro.net (gshapiro_at_localhost [IPv6:::1])
        by horsey.gshapiro.net (8.12.3/8.12.3) with ESMTP id g3NHYnOE042742
        (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Tue,
 23 Apr 2002 10:34:49 -0700 (PDT)
Received: (from gshapiro_at_localhost)
        by horsey.gshapiro.net (8.12.3/8.12.3/Submit) id g3NHYmSe042741; Tue,
 23 Apr 2002 10:34:48 -0700 (PDT)
Date: Tue, 23 Apr 2002 10:34:48 -0700
From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
Subject: Re: trouble with HOST
In-reply-to: <3CC4F4E7.1020700_at_natur.cuni.cz>
To: David Komanek <David.Komanek_at_natur.cuni.cz>
Cc: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
 Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>, sendmail-questions_at_sendmail.org,
 sendmail+rickert_at_sendmail.org
Message-id: <20020423173448.GP24326_at_horsey.gshapiro.net>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: inline
User-Agent: Mutt/1.3.28i
X-Obalka-From: gshapiro_at_gshapiro.net
References: <17446.1019244292_at_euclid.cs.niu.edu>
 <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz>
 <20020422210931.GI9539_at_scooter.smi.sendmail.com>
 <3CC4F4E7.1020700_at_natur.cuni.cz>


The man page for flock() on Digital UNIX 4.0G Rev 1530 states:

  NOTES

  Locks are on files, not file descriptors. That is, file descriptors dupli-
  cated using the dup() or fork() functions do not result in multiple
  instances of a lock, but rather multiple references to a single lock. If a
  process holding a lock on a file forks and the child explicitly unlocks the
  file, the parent will lose its lock.

This is the behavior we want from flock. The behavior we *don't* want
is for the lock to be lost if one process closes a file descriptor. This
isn't the case here since locks are on files, not descriptors as the
first sentence states.

  Processes blocked awaiting a lock may be awakened by signals.

  The file locks set by the flock()function do not interact in any way with
  the file locks set by the fcntl() and lockf() functions. If a process sets
  an exclusive lock on a file using the flock() function, the lock will not
  affect any process that is setting or clearing locks on the same file using
  the fcntl() or lockf() functions. It is therefore possible for an incon-
  sistency to arise if a file is locked by different processes using flock()
  and fcntl(). (The fcntl() and lockf() functions use the same mechanism for
  record locking.)

That also proves that flock() on 4.0G is *not* based on fcntl() since
the locks are distinct.

flock() is the right choice for Digital UNIX 4.X. Now I have to find a 5.X
system to see if they changed things on us.


Content-id: <Pine.OSF.4.21.0204241147246.278821_at_tao.natur.cuni.cz>
Content-type: message/rfc822; CHARSET=US-ASCII
Content-description: Re: trouble with HOST (fwd)

Return-path: <gshapiro_at_gshapiro.net>
Received: from horsey.gshapiro.net (root_at_horsey.gshapiro.net [209.220.147.178])
        by natur.cuni.cz (a.b.c/a.b.c) with ESMTP id g3NHdu6r131587; Tue,
 23 Apr 2002 19:39:57 +0200 (MDT)
Received: from horsey.gshapiro.net (gshapiro_at_localhost [IPv6:::1])
        by horsey.gshapiro.net (8.12.3/8.12.3) with ESMTP id g3NHdtOE042850
        (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Tue,
 23 Apr 2002 10:39:55 -0700 (PDT)
Received: (from gshapiro_at_localhost)
        by horsey.gshapiro.net (8.12.3/8.12.3/Submit) id g3NHdtKC042849; Tue,
 23 Apr 2002 10:39:55 -0700 (PDT)
Date: Tue, 23 Apr 2002 10:39:55 -0700
From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
Subject: Re: trouble with HOST
In-reply-to: <3CC4F4E7.1020700_at_natur.cuni.cz>
To: David Komanek <David.Komanek_at_natur.cuni.cz>
Cc: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
 Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>, sendmail-questions_at_sendmail.org,
 sendmail+rickert_at_sendmail.org
Message-id: <20020423173954.GQ24326_at_horsey.gshapiro.net>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: inline
User-Agent: Mutt/1.3.28i
X-Obalka-From: gshapiro_at_gshapiro.net
References: <17446.1019244292_at_euclid.cs.niu.edu>
 <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz>
 <20020422210931.GI9539_at_scooter.smi.sendmail.com>
 <3CC4F4E7.1020700_at_natur.cuni.cz>


Interesting.. On 4.0, the man page says:

  The flock() function operates on the local system only. It does not make
  any attempt to coordinate a file's lock status with other systems. In a
  distributed environment, use the fcntl() or lockf() interfaces to place
  advisory locks on files, as they provide a superset of flock() features.

On 5.0 the man page says:

  You can use the flock() function to coordinate a file's lock status on
  local, CFS, and NFS file systems.

The NOTES section also changed only part of the section (leaving the
"Locks are on files, not file descriptors" but changing the last
paragraph to:

  The flock() interface is not part of any UNIX standard. Therefore, if you
  are designing and writing applications to be portable across platforms, you
  should use the fcntl() file locking interface instead of flock().

I think they left the first NOTES paragraph behind in error as the locks
are on the file descriptor. So we will need some magic in conf.h to
disable flock for 5.X but leave it in place for 4.X and earlier.

I'm actually saddened to see that DEC has chosen to go with a broken
(IMHO) lock implementation.


Content-id: <Pine.OSF.4.21.0204241147247.278821_at_tao.natur.cuni.cz>
Content-type: message/rfc822; CHARSET=US-ASCII
Content-description: Re: trouble with HOST (fwd)

Return-path: <gshapiro_at_gshapiro.net>
Received: from horsey.gshapiro.net (root_at_horsey.gshapiro.net [209.220.147.178])
        by natur.cuni.cz (a.b.c/a.b.c) with ESMTP id g3NICK6r135991; Tue,
 23 Apr 2002 20:12:24 +0200 (MDT)
Received: from horsey.gshapiro.net (gshapiro_at_localhost [IPv6:::1])
        by horsey.gshapiro.net (8.12.3/8.12.3) with ESMTP id g3NICJOE043165
        (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Tue,
 23 Apr 2002 11:12:19 -0700 (PDT)
Received: (from gshapiro_at_localhost)
        by horsey.gshapiro.net (8.12.3/8.12.3/Submit) id g3NICJaa043164; Tue,
 23 Apr 2002 11:12:19 -0700 (PDT)
Date: Tue, 23 Apr 2002 11:12:19 -0700
From: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>
Subject: Re: trouble with HOST
In-reply-to: <3CC4F4E7.1020700_at_natur.cuni.cz>
To: David Komanek <David.Komanek_at_natur.cuni.cz>
Cc: Gregory Neil Shapiro <sendmail+gshapiro_at_sendmail.org>,
 Martin MOKREJ? <mmokrejs_at_natur.cuni.cz>, sendmail-questions_at_sendmail.org,
 sendmail+rickert_at_sendmail.org
Message-id: <20020423181219.GT24326_at_horsey.gshapiro.net>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: inline
User-Agent: Mutt/1.3.28i
X-Obalka-From: gshapiro_at_gshapiro.net
References: <17446.1019244292_at_euclid.cs.niu.edu>
 <Pine.OSF.4.21.0204221022180.351010-103000_at_tao.natur.cuni.cz>
 <20020422210931.GI9539_at_scooter.smi.sendmail.com>
 <3CC4F4E7.1020700_at_natur.cuni.cz>


Can you back out your change to conf.h and try this patch instead? Make
sure HASFLOCK is not shown in the debug output on a 5.X machine but it
is on a 4.X machine (if you have access to one).

Index: conf.h
===================================================================
RCS file: /cvs/include/sm/conf.h,v
retrieving revision 1.87
diff -u -u -r1.87 conf.h
--- conf.h 2002/04/02 08:11:52 1.87
+++ conf.h 2002/04/23 18:06:21
_at__at_ -612,7 +612,12 _at__at_
 # define GIDSET_T gid_t
 # define SM_INT32 int /* 32bit integer */
 # ifndef HASFLOCK
-# define HASFLOCK 1 /* has flock(2) call */
+# include <standards.h>
+# if _XOPEN_SOURCE+0 >= 400
+# define HASFLOCK 0 /* 5.0 and later has bad flock(2) call */
+# else /* _XOPEN_SOURCE+0 >= 400 */
+# define HASFLOCK 1 /* has flock(2) call */
+# endif /* _XOPEN_SOURCE+0 >= 400 */
 # endif /* ! HASFLOCK */
 # define LA_TYPE LA_ALPHAOSF
 # define SFS_TYPE SFS_STATVFS /* use <sys/statvfs.h> statfs() impl */
Received on Wed Apr 24 2002 - 14:10:20 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:43 NZDT