SUMM: watching a SPECIFIC process

From: Guy Dallaire <dallaire_at_total.net>
Date: Fri, 18 Oct 1996 13:23:34 -0400

Thanks to all who replied.

My original post:

One of our users has a process that's running in the background and exits
from time to time for no specific reason. He gets NO core dump when that
happens. We just can't find under what circumstances his proces dies.
     
Is there a (home mad or free) tool that I could use to "watch" that process
and tell me what was the system doing when the process exited or died ?
Maybe that could put us on the right track to investigate.
     
FYI:
     
That program is looping ad infinitum connecting to a database, scanning for
things to do, disconnecting, etc... It is started in background with a nohup
from a korn shell. We've included a sleep 10 command after the process
startup to make sure it has time to initialize before the korn shell exits.
--------------------------------------------------------------------------


The suggestions:

1) From the system call point of view:
     
     "trace" public domain
     
     or "truss" from the System V extension for DU; needs an "SVID" licence
     (this kit on the 1-st ALPHA DU CD-ROM not on the layered product one)

2) Use dbx to examine the core dump:

    In my particular case, the process does not dump core, so it is useless.

3) Use proc_info, a utility written by Randy M. hayman
haymanr_at_icefog.sois.alaska.edu.
    
    That utility will show you most pertinent things about that process
(which signals it is catching, what ones it is ignoring, its resource
usage and its resource limits. etc...).

    Randy suggested that maybe the process gets a CPU time limit signal and
exits.

4) You can use C2 security auditing which can be configured without using
the whole C2 setup. (man audit_setup(8)). This can permit to see all sys
calls done by a process.

    Its a bit complex to put in place for a single process but it is
supposed to work very well. It's also a disk hog (Needs lots of space)

5) Sounds like a bug. You may want to run the program with

    nohup my.program >out.log 2>err.log </dev/null &

    For finding out what is going wrong putting lots of logging printf output
    into it (or syslog) and seeing where it got to is often the best bet. Use
    libraries like electric fence to test for gradual memory corruption. You
    may also want to leave it running for several days on a direct login under
    a debugger.

                                        Thanks again !


Guy Dallaire
dallaire_at_megatoon.com

"God only knows if god exists"
Received on Fri Oct 18 1996 - 20:49:34 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT