Linux – Are Threads Implemented as Processes on Linux?

clinuxlinux-kernelprocessthread

I'm going through this book, Advanced Linux Programming by Mark Mitchell, Jeffrey Oldham, and Alex Samuel. It's from 2001, so a bit old. But I find it quite good anyhow.

However, I got to a point when it diverges from what my Linux produces in the shell output. On page 92 (116 in the viewer), the chapter 4.5 GNU/Linux Thread Implementation begins with the paragraph containing this statement:

The implementation of POSIX threads on GNU/Linux differs from the
thread implementation on many other UNIX-like systems in an important
way: on GNU/Linux, threads are implemented as processes.

This seems like a key point and is later illustrated with a C code. The output in the book is:

main thread pid is 14608
child thread pid is 14610

And in my Ubuntu 16.04 it is:

main thread pid is 3615
child thread pid is 3615

ps output supports this.

I guess something must have changed between 2001 and now.

The next subchapter on the next page, 4.5.1 Signal Handling, builds up on the previous statement:

The behavior of the interaction between signals and threads varies
from one UNIX-like system to another. In GNU/Linux, the behavior is
dictated by the fact that threads are implemented as processes.

And it looks like this will be even more important later on in the book. Could someone explain what's going on here?

I've seen this one Are Linux kernel threads really kernel processes?, but it doesn't help much. I'm confused.

This is the C code:

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

void* thread_function (void* arg)
{
    fprintf (stderr, "child thread pid is %d\n", (int) getpid ());
    /* Spin forever. */
    while (1);
    return NULL;
}

int main ()
{
    pthread_t thread;
    fprintf (stderr, "main thread pid is %d\n", (int) getpid ());
    pthread_create (&thread, NULL, &thread_function, NULL);
    /* Spin forever. */
    while (1);
    return 0;
}

Best Answer

I think this part of the clone(2) man page may clear up the difference re. the PID:

CLONE_THREAD (since Linux 2.4.0-test8)
If CLONE_THREAD is set, the child is placed in the same thread group as the calling process.
Thread groups were a feature added in Linux 2.4 to support the POSIX threads notion of a set of threads that share a single PID. Internally, this shared PID is the so-called thread group identifier (TGID) for the thread group. Since Linux 2.4, calls to getpid(2) return the TGID of the caller.

The "threads are implemented as processes" phrase refers to the issue of threads having had separate PIDs in the past. Basically, Linux originally didn't have threads within a process, just separate processes (with separate PIDs) that might have had some shared resources, like virtual memory or file descriptors. CLONE_THREAD and the separation of process ID(*) and thread ID make the Linux behaviour look more like other systems and more like the POSIX requirements in this sense. Though technically the OS still doesn't have separate implementations for threads and processes.

Signal handling was another problematic area with the old implementation, this is described in more detail in the paper @FooF refers to in their answer.

As noted in the comments, Linux 2.4 was also released in 2001, the same year as the book, so it's not surprising the news didn't get to that print.