'rsh' command hangs with defunct process

From: Peter Greinig <pjg_at_butlins.co.uk>
Date: Fri, 29 Nov 1996 10:27:14 +0000 (GMT)

Hi managers...

I'm not sure whether this post is absolutely relevant to Digital Unix,
but it does involve an OSF box, so here goes...

We have a process which is constantly running on our main reservations
system (running DEC Unix 3.2D-1) which processes credit card authorisations.
Each morning at 07:00 this process forks a child process to transmit a payment
file to the bank. This process in turn performs a number of operations
using 'rsh' to a PC running Linux.

The problem is that sometimes the 'rsh' command on the OSF box never completes,
the command _does_ run on the Linux box, but the process on the OSF machine
gets left owning a defunct process & then hangs forever.

Here is a process 'tree' to illustrate this (from OSF machine, node regis10):

-> 11474:resdev :/disk2/solve/bin/solvese_reslive
   -> 2086 :resdev :sh payment
      -> 17471:resdev :bash -x ../bin/butlins_pfg
         -> 3974 :resdev :rsh regis8 -l root rm -f /dos/metape/fj/trans/tran4.dat
            -> 24597:resdev :<defunct>
 
11474 is the constantly running process, which forks 2086 which in turn
forks 17471 which is a bash script which contains (among other things)
various 'rsh' commands.

>From the above example the 'rsh' command is an 'rm' (sometimes this fails
on an 'ls' command as well). Nothing appears in the /var/adm/messages or syslog
on the OSF box, but the following is in the /var/adm/messages file on
the Linux box (regis8):

Nov 29 07:02:51 regis8 in.rshd[26026]: connect from regis10
Nov 29 07:02:52 regis8 rshd[26027]: resdev_at_regis10 as root: cmd='rm -f /dos/metape/fj/trans/tran4.dat'

The file (tran4.dat) is not there, so the command _did_ work. There are
no 'odd' processes left on the Linux box, just the processes on the
OSF box.

This is not always the case, some mornings this process works without any
problems & also if I do the command at the bash prompt I've never managed
to make it fail [don't you just _hate_ problems like that!].

I suppose my question is really... Has anyone ever seen the case where a rsh
command correctly executes the remote command, but then hangs owning a
defunct process?

Help!

Many thanks & sorry for such a long message (I'll summarise),

Peter Greinig <pjg_at_butlins.co.uk>

Butlin's Holidays Ltd., UK.
Received on Fri Nov 29 1996 - 12:47:08 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT