QUERY: Hanging Sendmail

From: Bob Wier <wier_at_bobcat.etsu.edu>
Date: Fri, 10 Mar 1995 17:11:14 -0600

We've got a problem with Sendmail on our 3000/600s box running OSF/1 V2.0.

There are about 5 relatively small volume mailing lists running, and about
every 12 hours or so sendmail gets hung. What happens is that it starts
taking up > 90% of the cpu cycles, and never gives up until cancelled. We
have the sendmail configuration file set up for 1 hour max delivery time,
and a 2 day mqueue expiration.

In once really bizzare case, we were trying to send mail to uenf.br which
wasn't reachable at the time, according to NSLOOKUP. The process hung but
it also did the proper thing and requeued it for later delivery. When the
next delivery attempt was made, THAT process hung, and requeued. Eventually
we got about 12 processes hung trying to deliver to that one address.

This is an intermittant problem, and so far we havn't been able to find
anything in particular that causes it. It *perhaps* might be in cases where
NAMED is a little slow in returning an answer, but it also happened once
between two machines here on campus. We've looked at the sendmail.cf file
and also tried using different NAMED pointers, one of the machine itself,
and alternately to a different machine on campus. Same results both ways so
it looks like it's not a NAMED problem. We suspected the listprocessor
(6.0c) but also saw it "hang" once on a non-listserv mail exchange.

I've run out of ideas short of using a different version of Sendmail. I've
been in correspondence with B. Costales (author of the Sendmail "Bat Book")
and he suggests switching over to the latest version of V8 (the sendmail
DEC supplies appears to be the KCS/Paul Vixie version). Before we do this
pretty major change, I thought I'd ask here to see if anyone else has seen
this problem and look for possible fixes.

Here's a sample PS containing a "hung" sendmail process...

server 516 1 0.0 Mar 03 ?? 0:03.37 /usr/users/server/serverd
ftp 7336 352 0.0 Mar 09 ?? 0:00.11 -ip216.e (ftpd)
root 7786 1 0.0 Mar 08 ?? 0:12.31 SCREEN
server 9099 516 0.0 09:02:27 ?? 0:00.01 sh -c
/usr/users/server/list -1 -L MC68HC11 -Z -e -m 2
server 9100 9099 0.0 09:02:27 ?? 0:00.87 /usr/users/server/list
-1 -L MC68HC11 -Z -e -m 2
server 9420 9100 0.0 09:05:10 ?? 0:00.01 sh -c /bin/mail >
/dev/null 2>&1 gmp_at_unx.dec.com rhandelm_at_geometric.
root 9421 9420 95.0 09:05:10 ?? 06:27:49 -sendmail
gmp_at_unx.dec.com rhandelm_at_geometric.com (sendmail)

Note the last entry showing 95% cpu usage which has now been running (pid
9421) for about 9 hours.

THANKS!

 -- Round Up the Usual Disclaimers! --
Bob Wier, CS Dept., East Texas State University
   wier_at_bobcat.etsu.edu - keeper of the
    Motorola MC68HC11, Photo-3D, SD3D,
Icom Radio and Overland Trails mailing lists
Received on Fri Mar 10 1995 - 18:10:45 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT