I posted a query (appended at end) about a month ago asking for
help dealing with unkillable, but idle, ntalkd processes. I got
responses from Todd B. Acheson <acheson_at_ohiou.edu>, who said
he had seen the same phenomenon often with earlier Digital UNIX
versions, but thought it had gone away in 3.2c (possibly with patches),
and from Chris Teakle <ccteakle_at_nargun.cc.uq.edu.au>, who had
seen a similar problem where "write" and "ntalkd" processes on the
same tty port were unkillable, except for one "master" write process;
killing that killed all the others. My problem is not
associated with any "write" processes.
Meanwhile, I searched the alpha-osf-managers archives, but found
nothing that seemed the same as my problem. Then I looked at
Digital's anonymous ftp patch server (ftp.service.digital.com).
There, I found a consolidated patch kit for Digital UNIX v3.2d-1
(there are other kits for other versions) that included a patch
for hung ntalkd daemons connected to a LAT terminal. We are not
using LAT terminals, but perhaps are triggering the same bug.
I will try installing this patch (plus many others in the kit
which look relevant to my system) during the Christmas break
when the students are gone.
-Phil Farrell, Computer Systems Manager
Stanford University School of Earth Sciences
farrell_at_pangea.stanford.edu
---------original query:
>From farrell Thu Oct 30 10:36:22 1997
To: alpha-osf-managers_at_ornl.gov
Subject: Unkillable ntalkd processes in DUNIX 3.2d-1
Hi all,
I have a problem on my AlphaServer 1000 system, running DUNIX 3.2d-1,
with unkillable ntalkd processes. The command
ps -A -j | grep ntalkd
shows that I currently have 83 of these running on my system.
Here is a sample of the output:
USER PID PPID PGID SESS JOBC S TTY TIME COMMAND
root 1799 1 1010 1010 0 U ?? 0:00.00 ntalkd
root 2164 1 1771 1771 0 U ?? 0:00.00 ntalkd
root 2924 1 930 930 0 U ?? 0:00.00 ntalkd
root 10473 1 1319 1319 0 U ?? 0:00.00 ntalkd
root 12871 1 992 992 0 U ?? 0:00.00 ntalkd
root 18209 1 1307 1307 0 U ?? 0:00.00 ntalkd
root 20459 1 234 234 0 U ?? 0:00.00 ntalkd
root 25710 1 948 948 0 U ?? 0:00.00 ntalkd
root 31092 1 1000 1000 0 U ?? 0:00.00 ntalkd
root 32007 1 753 753 0 U ?? 0:00.00 ntalkd
All 83 share parent process id #1 (init), which implies that they were
orphaned by their original parent, inetd (#448 in my case). Other ps
options that show start date show that these ntalkd processes have been
started at various times during the last 4 weeks. The system was last
rebooted about 8 weeks ago.
I have attempted to kill these from the root account with commands like
kill 20459
kill -9 20459
kill -15 20459
The kill command doesn't complain, but the process persists! Well,
checking the "ps" man page, I see that this is not surprising, because
the "U" state shared by all of these means "Uninterruptible sleeping
process". If I can't interrupt it, I can't send it a kill signal.
So, my problem is twofold:
1) How do I kill these "U" state processes?
2) How do I stop more of them from ending up in this state?
The second problem is the more worrisome. During the summer, I had
to reboot the system after it had been up for five months (a record
around here!) because it was out of process table slots. Checking
the crash dump, I saw hundreds of these ntalkd processes. Now
the problem appears to be coming back. I have temporarily turned
off the "talk" service in /etc/inetd.conf, but that is a crude
fix. My users would really like to have a functioning talk program.
Does anyone know of a known bug (and patch) in either the ntalkd
program or the kernel that causes these unkillable ntalkd daemon
processes to be left around?
Thanks for any ideas.
-Phil Farrell, Computer Systems Manager
Stanford University School of Earth Sciences
farrell_at_pangea.stanford.edu
Received on Fri Dec 05 1997 - 22:55:18 NZDT