I have a hypothetical situation:
-
Let us say we have two strace processes S1 & S2, which are simply monitoring each other.
How can this be possible?
Well, in the command-line options for strace,-p PID
is the way to pass the required PID, which (in our case) is not yet known when we issue the strace command. We could change the strace source code, such that-P 0
means, ask user for PID. E.g., read() from STDIN. When we can run two strace processes in two shell sessions and find the PIDs in a third shell, we can provide that input to S1 & S2 and let them monitor each other.
Would S1 & S2 get stuck? Or, go into infinite loops, or crash immediately or…? -
Again, let us say we have another strace process S3, with
-p -1
, which, by modifying the source code, we use to tell S3 to monitor itself. E.g., use getpid() without using STDIN.
Would S3 crash? Or, would it hang with no further processing possible? Would it wait for some event to happen, but, because it is waiting, no event would happen?
In the strace man-page, it says that we can not monitor an init process. Is there any other limitation enforced by strace, or by the kernel, to avoid a circular dependency or loop?
Some Special Cases :
S4 monitors S5, S5 monitors S6, S6 monitors S4.
S7 & S8 monitoring each other where S7 is the Parent of S8.
More special cases are possible.
EDIT (after comments by @Ralph Rönnquist & @pfnuesel) :
https://github.com/bnoordhuis/strace/blob/master/strace.c#L941
if (pid <= 0) {
error_msg_and_die("Invalid process id: '%s'", opt);
}
if (pid == strace_tracer_pid) {
error_msg_and_die("I'm sorry, I can't let you do that, Dave.");
}
Specifically, what will happen if strace.c
does not check for pid == strace_tracer_pid
or any other special cases? Is there any technical limitation (in kernel) over one process monitoring itself? How about a group of 2 (or 3 or more) processes monitoring themselves? Will the system crash or hang?
Best Answer
I will answer for Linux only.
Surprisingly, in newer kernels, the
ptrace
system call, which is used bystrace
in order to actually perform the tracing, is allowed to trace the init process. The manual page says:implying that starting in version 2.6.26, you can trace
init
, although of course you must still be root in order to do so. Thestrace
binary on my system allows me to traceinit
, and in fact I can even usegdb
to attach toinit
and kill it. (When I did this, the system immediately came to a halt.)ptrace
cannot be used by a process to trace itself, so ifstrace
did not check, it would nevertheless fail at tracing itself. The following program:prints
Operation not permitted
(i.e., the result isEPERM
). The kernel performs this check inptrace.c
:Now, it is possible for two
strace
processes can trace each other; the kernel will not prevent this, and you can observe the result yourself. For me, the last thing that the firststrace
process (PID = 5882) prints is:whereas the second
strace
process (PID = 5890) prints nothing at all.ps
shows both processes in the statet
, which, according to theproc(5)
manual page, means trace-stopped.This occurs because a tracee stops whenever it enters or exits a system call and whenever a signal is about to be delivered to it (other than
SIGKILL
).Assume process 5882 is already tracing process 5890. Then, we can deduce the following sequence of events:
ptrace
system call, attempting to trace process 5882. Process 5890 enters trace-stop.SIGCHLD
to inform it that its tracee, process 5890 has stopped. (A trace-stopped process appears as though it received the `SIGTRAP signal.)ptrace(PTRACE_SYSCALL, 5890, ...)
to allow process 5890 to continue.ptrace(PTRACE_SEIZE, 5882, ...)
. When the latter returns, process 5890 enters trace-stop.SIGCHLD
since its tracee has just stopped again. Since it is being traced, the receipt of the signal causes it to enter trace-stop.Now both processes are stopped. The end.
As you can see from this example, the situation of two process tracing each other does not create any inherent logical difficulties for the kernel, which is probably why the kernel code does not contain a check to prevent this situation from happening. It just happens to not be very useful for two processes to trace each other.