Date: Thu, 09 May 91 23:26:50 -0700 From: “Erik E. Fair”6 (Your Friendly Postmaster) To: tcp-ip@nic.ddn.mil, unicode@sun.com, [...] Subject: Case of the Replicated Errors: An Internet Postmaster’s Horror Story This Is The Network: The Apple Engineering Network. The Apple Engineering Network has about 100 IP subnets, 224 AppleTalk zones, and over 600 AppleTalk networks. It stretches from Tokyo, Japan, to Paris, France, with half a dozen locations in the U.S., and 40 buildings in the Silicon Valley. It is interconnected with the Internet in three places: two in the Silicon Valley, and one in Boston. It supports almost 10,000 users every day. When things go wrong with e-mail on this network, it’s my problem. My name is Fair. I carry a badge. [insert theme from Dragnet] The story you are about to read is true. The names have not been changed so as to finger the guilty. It was early evening, on a Monday. I was working the swing shift out of Engineering Computer Operations under the command of Richard Herndon. I don’t have a partner. While I was reading my e-mail that evening, I noticed that the load average on apple.com, our VAX-8650, had climbed way out of its normal range to just over 72. Upon investigation, I found that thousands of Internet hosts were trying to send us an error message. I also found 2,000+ copies of this error message already in our queue. I immediately shut down the sendmail daemon which was offering SMTP service on our VAX. I examined the error message, and reconstructed the following sequence of events: We have a large community of users who use QuickMail, a popular Macintosh based e-mail system from CE Software. In order to make it possible for these users to communicate with other users who have chosen to use other e-mail systems, ECO supports a QuickMail to Internet e-mail gateway. We use RFC822 Internet mail format, and RFC821 SMTP as our common intermediate r-mail standard, and we gateway everything that we can to that standard, to promote interoperability. The gateway that we installed for this purpose is MAIL*LINK SMTP from Starnine Systems. This product is also known as GatorMail-Q from Cayman Systems. It does gateway duty for all of the 3,500 QuickMail users on the Apple Engineering Network. Many of our users subscribe, from QuickMail, to Internet mailing lists which are delivered to them through this gateway. One such user, Mark E. Davis, is on the unicode@sun.com mailing list, to discuss some alternatives to ASCII with the other members of that list. Sometime on Monday, he replied to a message that he received from the mailing list. He composed a one paragraph comment on the original message, and hit the “send” button. Somewhere in the process of that reply, either QuickMail or MAIL*LINK SMTP mangled the “To:” field of the message. The important part is that the “To:” field contained exactly one “<” character, without a matching “>” character. This minor point caused the massive devastation, because it interacted with a bug in sendmail. Note that this syntax error in the “To:” field has nothing whatsoever to do with the actual recipient list, which is handled separately, and which, in this case, was perfectly correct. The message made it out of the Apple Engineering Network, and over to Sun Microsystems, where it was exploded out to all the recipients of the unicode@sun.com mailing list. Sendmail, arguably the standard SMTP daemon and mailer for UNIX, doesn’t like “To:” fields which are constructed as described. What it does about this is the real problem: it sends an error message back to the sender of the message, AND delivers the original message onward to whatever specified destinations are listed in the recipient list. This is deadly. The effect was that every sendmail daemon on every host which touched the bad message sent an error message back to us about it. I have often dreaded the possibility that one day, every host on the Internet (all 400,000 of them) would try to send us a message, all at once. On Monday, we got a taste of what that must be like. I don’t know how many people are on the unicode@sun.com mailing list, but I’ve heard from Postmasters in Sweden, Japan, Korea, Aus- tralia, Britain, France, and all over the U.S. I speculate that the list has at least 200 recipients, and about 25% of them are actually UUCP sites that are MX’d on the Internet. I destroyed about 4,000 copies of the error message in our queues here at Apple Computer. After I turned off our SMTP daemon, our secondary MX sites got whacked. We have a secondary MX site so that when we’re down, someone else will collect our mail in one place, and deliver it to us in an orderly fashion, rather than have every host which has a message for us jump on us the very second that we come back up. Our secondary MX is the CSNET Relay (relay.cs.net and relay2.cs.net). They eventually destroyed over 11,000 copies of the error message in the queues on the two relay machines. Their postmistress was at wit’s end when I spoke to her. She wanted to know what had hit her machines. It seems that for every one machine that had successfully contacted apple.com and delivered a copy of that error message, there were three hosts which couldn’t get ahold of apple.com because we were overloaded from all the mail, and so they contacted the CSNET Relay instead. I also heard from CSNET that UUNET, a major MX site for many other hosts, had destroyed 2,000 copies of the error message. I presume that their modems were very busy delivering copies of the error message from outlying UUCP sites back to us at Apple Computer. This instantiation of this problem has abated for the moment, but I’m still spending a lot of time answering e-mail queries from postmasters all over the world. The next day, I replaced the current release of MAIL*LINK SMTP with a beta test version of their next release. It has not shown the header mangling bug, yet. The final chapter of this horror story has yet to be written. The versions of sendmail with this behavior are still out there on hundreds of thousands of computers, waiting for another chance to bury some unlucky site in error messages. Are you next? [insert theme from “The Twilight Zone”] just the vax, ma’am, Erik E. Fair fair@apple.com