Why are user level threads faster than kernel level threads

kernelthreads

I found out that user level threads are a lot faster than kernel level threads but I could not find any convincing example of WHY is user level threads are faster than kernel level threads ? Could someone explain that to me.

The kernel-level threads are slow and inefficient. For instance,
threads operations are hundreds of times slower than that of
user-level threads.

taken from here

Best Answer

Kernel-level threads require a context switch, which involves changing a large set of processor registers that define the current memory map and permissions. It also evicts some or all of the processor cache.

User-level threads just require a small amount of bookkeeping within one kernel thread or process.

However, the difference isn't big if your threads are predominantly doing I/O operations, as those have to go through the kernel in any case. It's most important if you're trying to implement some kind of simulation with a very large number of independant processes. In that case you need to pay careful attention to what thread synchronisation mechanisms you use, as some of them also go up to the kernel and trigger a context switch.

http://www.cs.rochester.edu/u/cli/research/switch.pdf "In general, the indirect cost of context switch ranges from several microseconds to more than one thousand microseconds for our workload."

Edit: user-level threads maintain a stack per-thread, and may or may not save the general-purpose registers depending on the architecture and the clobber rules of its calling convention. It can be as simple as dumping the registers to the stack, jumping to a new address, and popping a few registers, which may be in your cache if that thread was run recently.

Kernel-level context switches also change the memory map by writing to the TLB, and changing the security level (privilege level or "ring") of the processor. See "Performance Considerations"

Related Solutions

Number of kernel threads = cores

No, you can set the maximum kernel threads to very high numbers.

Note that the word "threads" is used for many different things:

Most programmers use it to refer to independent threads of execution in the sense of POSIX threads. This is a way of organising programs and does not depend on hardware support. See Maximum number of threads per process in Linux?
Intel use it to refer to their "Hyperthreading" technology. See Why does my Intel i7-920 display 8 cores instead of 4 cores? and What does "thread" mean as related to CPUs?

It may be that Intels use causes confusion.

Update re kernel threads

Here are some Linux kernel threads running in CoLinux under Vista on AMD Athlon 64 X2 dual-core.

$ ps -eLf
UID        PID  PPID   LWP  C NLWP STIME TTY          TIME CMD
root         1     0     1  0    1 17:24 ?        00:00:00 init [2]
root         2     0     2  0    1 17:24 ?        00:00:00 [kthreadd]
root         3     2     3  0    1 17:24 ?        00:00:00 [ksoftirqd/0]
root         4     2     4  0    1 17:24 ?        00:00:00 [events/0]
root         5     2     5  0    1 17:24 ?        00:00:00 [khelper]
root        21     2    21  0    1 17:24 ?        00:00:00 [kblockd/0]
root        22     2    22  0    1 17:24 ?        00:00:00 [kseriod]
root        41     2    41  0    1 17:24 ?        00:00:00 [pdflush]
root        42     2    42  0    1 17:24 ?        00:00:00 [pdflush]
root        43     2    43  0    1 17:24 ?        00:00:00 [kswapd0]
root        44     2    44  0    1 17:24 ?        00:00:00 [aio/0]
root       727     2   727  0    1 17:24 ?        00:00:00 [kjournald]

LWP is the thread ID.

(See man ps: "-L Show threads, possibly with LWP and NLWP columns" … "LWP lwp (light weight process, or thread) ID of the lwp being reported. (alias spid, tid)")

kthreadd is the kernel thread daemon, I believe is is responsible for all the other kernel threads. Note I am not showing daemons like klogd which do not execute in ring 0 (as far as I know).

Number of kernel threads != number of CPU cores. (ref title of question)

Kernel threads consist of a set of registers, a stack, and a few corresponding kernel data structures.

…

The purported advantage of kernel threads over processes is faster creation and context switching compared with processes.

…

Kernel threads are considered “lightweight,” and one would expect the number of threads to only be limited by address space and processor time

…

In particular, operating system kernels tend to see kernel threads as a special kind of process rather than a unique entity. For example, in the Solaris kernel threads are called “light weight processes” (LWP’s). Linux actually creates kernel threads using a special variation of fork called “clone,” and until recently gave each thread a separate process ID. Because of this heritage, in practice kernel threads tend to be closer in memory and time cost to processes than user-level threads,

(Multiple Flows of Control in Migratable Parallel Programs 2006)

Kernel threads and process virtual address

I can't definitively answer the "kernel threads" question for Linux. For Windows, I can tell you that the "kernel threads" are simply threads created from some other kernel mode routine, running procedures that never enter user mode. When the scheduler picks a thread for execution it resumes its previous state (user or kernel, whatever that was); the CPU doesn't need to "tell the difference". The thread executes in kernel mode because that's what it was doing the last time it was executing.

In Windows these typically are created with the so-called "System" process as their parent, but they can actually be created in any process. So, in Unix they can have a parent ID of zero? i.e. belonging to no process? This actually doesn't matter unless the thread tries to use process-level resources.

As for the addresses assigned by the compiler... There are a couple of possible ways to think about this. One part of it is that the compiler really doesn't pick addresses for much of anything; almost everything a compiler produces (in a modern environment) is in terms of offsets. A given local variable is at some offset from wherever the stack pointer will be when the routine is instantiated. (Note that stacks themselves are at dynamically assigned addresses, just like heap allocations are.) A routine entry point is at some offset from the start of the code section it's in. Etc.

The second part of the answer is that addresses, such as they are, are assigned by the linker, not the compiler. Which really just defers the question - how can it do this? By which I guess you mean, how does it know what addresses will be available at runtime? The answer is "practically all of them."

Remember that every process starts out as an almost completely blank slate, with a new instantiation of user mode address space. e.g. every process has its own instance of 0x10000. So aside from having to avoid a few things that are at well-known (to the linker, anyway) locations within each process on the platform, the linker is free to put things where it wants them within the process address space. It doesn't have to know or care where anything else already is.

The third part is that nearly everything (except those OS-defined things that are at well-known addresses) can be moved to different addresses at run time, due to Address Space Layout Randomization, which exists on both Windows and Linux (Linux released it first, in fact). So it doesn't actually matter where the linker put things.

Best Answer

Related Solutions

Number of kernel threads = cores

Kernel threads and process virtual address

Related Question