runaway "rshd"s

From: Mohan <mkannapa_at_ford.com>
Date: Tue, 01 Dec 1998 11:25:54 -0500 (EST)

Greeting Managers,
        I have a unique problem which I don;t know how to solve
        it. We are currently running DU V4.0D and we have a
        monitor program that runs on a remote host and uses
        "rexec" system call to do a general health check of
        our alpha system.

        The following is the scenario I run into:
        
        The monitor program has a timeout built into it where the
        system call (rexec) is interrupted if there is no reponse
        within a certain time period. What this does to our DU V4.0D
        alphaserver is leave a "rshd" that uses 100% cpu (as shown by
        top) and does not die!! This happens ONLY on digital Unix
        server, the monitor programs does the same thing on suns,
        crays etc, but they do not show this symptoms at all.
        This usually happens when our Digital Unix system is running
        low on memory and it takes a long time to execute the "rexec"
        calls which in turns prompts the "monitor" program to timeout
        and thus leaving these runsway "rshd"???

        Apart from a "cron" job solution, is this behaviour normal?
        Or is it a bug in the "rshd" for Digital Unix systems??

        Any suggestions or ideas are welcome to solve the problem?
        We have tried increasing the timeout value on the monitor
        program but every now and then we run into this!

        Thanks for any help
        Mohan
 
Received on Tue Dec 01 1998 - 16:26:44 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT