Dear all,
I have a tentative summary... most people suggested that the box had
been hacked, sadly this appears not to be the case as there is a
pretty air-gap between the box and the Internet. Although you never
know what your users have been installing around the system I am
pretty confident that it is not the issue. None of my IDSs show
anything untowards and the box, apart from this issue is untouched
according to tripwire.
The machine was deployed with 4.0G at the beginning of March replacing
an identical system running 4.0F. Since deployment it had this fancy
"named replication" issue, except that it had been kept at bay by
myself killing off the mutant strain and restarting named.
Unfortunately I then left it for a week while dealing with something
more serious and checking it today found:
1) 584 copies of named running,
2) a message in my mail telling me that syslog.dated was no more
Now, as the famous saying goes: "never explain by conspiracy what can
easily be explained by stupidity". So, after the obligatory "oh my
God, we've been hacked", happily shouted by myself and a bunch of
people from the list, looking at the problem more carefully:
1) the box runs in lazy swap mode and under OSF/1 when the box runs
out of memory it starts killing processes on a first come first
nuked basis,
2) named had been replicating like made suffocating the system and
filling up the 192Mb of RAM available,
Now, what if before I went off on holiday and then dealing with
something different syslogd had died? Then the following line in
root's cron:
40 4 * * * find /var/adm/syslog.dated -depth -type d -ctime +5 -exec rm -rf {} \;
would have nuked the '.' directory after it hadn't been changed for
over 10 days since nothing in there moved (syslogd dead, remember?).
Hence the message this morning about syslog.dated being dead.
Now, what about the runaway named. Well, for reasons unknown to me
there are two different binaries shipped, one in /sbin and one in
/usr/sbin (before you shout "hack, hack" please note that, as verified
by myself and, independently by <sysadmin_at_astro.su.se>, they are
different on the 4.0G _CD_ itself).
Re-installing the 4.0G named SSRT _reading_ the instructions (the
little README file says to change _both_ /sbin/named and
/usr/sbin/named) and re-starting named seems to have fixed the runaway
named issue. If this fix lasts (it has for the past 6 hours) then we
are in business. Classic case of RTFM, I guess.
Re-creating /var/adm/syslog.dated is simply a matter of
"/sbin/init.d/syslog start".
Thanks are due to:
Mark Menkhus
Derk Tegeler
Allan E Johannesen
Ron Parker
MIKE WHOLF
J Bacher
and, in particular to,
sysadmin_at_astro.su.se
who discussed the cron issue in detail with me and offered some
excellent insights.
Ciao,
Arrigo
--
Arrigo Triulzi <arrigo_at_albourne.com>
Albourne Partners Ltd. - London, UK
Received on Thu Mar 22 2001 - 18:22:10 NZST