Linux: Writing a watchdog to monitor multiple processes

linuxmonitoringprocess

A few years ago, a coworker came up with an elegant solution for a watchdog program. The program ran on Windows and used Windows Event objects to monitor the process handles (PID’s) of several applications. If any one of the processes terminated unexpectedly, its process handle would no longer exist and his watchdog would immediately be signaled. The watchdog would then take an appropriate action to “heal” the system.

My question is, how would you implement such a watchdog on Linux? Is there a way for a single program to monitor the PID’s of many others?

Best Answer

The traditional, portable, commonly-used way is that the parent process watches over its children.

The basic primitives are the wait and waitpid system calls. When a child process dies, the parent process receives a SIGCHLD signal, telling it it should call wait to know which child died and its exit status. The parent process can instead choose to ignore SIGCHLD and call waitpid(-1, &status, WNOHANG) at its convenience.

To monitor many processes, you would either spawn them all from the same parent, or invoke them all through a simple monitoring process that just calls the desired program, waits for it to terminate and reports on the termination (in shell syntax: myprogram; echo myprogram $? >>/var/run/monitor-collector-pipe). If you're coming from the Windows world, note that having small programs doing one specialized task is a common design in the Unix world, the OS is designed to make processes cheap.

There are many process monitoring (also called supervisor) programs that can report when a process dies and optionally restart it and far more besides: Monit, Supervise, Upstart, …

Related Solutions

Linux – Process Monitor Equivalent for Linux

The console standby for this is top, but there are alternatives like my favorite htop that give you a little more display flexibility and allow you a few more operations on the processes.

A less interactive view that is better for use in scripts would be the ps program and all it's relatives.

Edit: Based on your clarified question, you might note that strace handles watching system calls made by a given process including all read-write operations and os function calls. You can activate it on the command line before the program you want to track or attach to a running process by hitting s on a process selected in htop.

Process Monitoring – How to Monitor Multiple PIDs with Top

How about something like

pids=( $(pgrep 'postgres|unicorn|nginx') )

to put the PIDs in an array, and then

top "${pids[@]/#/-p }"

to spit them back out into top, prepending each with -p

Best Answer

Related Solutions

Linux – Process Monitor Equivalent for Linux

Process Monitoring – How to Monitor Multiple PIDs with Top

Related Question