Short answers:
- No, it's not a process
- User threads are not rooted at init.
Init is just the first process; it does not manage any proceses or threads. It does create some, using the kernel syscalls fork() and exec.
I think you have a muddy idea of what a process is. it doesn't just mean a bit of executing code. Yeah, the kernel executes before init (and the boot loader before that even). But a 'process' has a specific definition of:
- Runs in user space
- Runs with a process ID
- Many interactions need to go through the kernel
- All resources need to come from kernel
- Needs to be scheduled by kernel
So, once the kernel initializes, it runs init, which then spawns whatever other processes its configuration says to.
As far as #2 goes, all the kernel stuff is, well, in the kernel. Think of the kernel as a large area of code. Again, not a process, but a large code blob. Parts of the kernel deal with memory management, parts of it with scheduling portions of itself (like drivers, etc.), and parts of it with scheduling processes.
As someone who has written programs that execute without an OS, I offer a definitive answer.
Would an executable need an OS kernel to run?
That depends on how that program was written and built.
You could write a program (assuming you have the knowledge) that does not require an OS at all.
Such a program is described as standalone.
Boot loaders and diagnostic programs are typical uses for standalone programs.
However the typical program written and built in some host OS environment would default to executing in that same host OS environment.
Very explicit decisions and actions are required to write and build a standalone program.
... the output from the compiler is the machine code (executable) which I thought were instructions to the CPU directly.
Correct.
Recently I was reading up on kernels and I found out that programs cannot access the hardware directly but have to go through the kernel.
That's a restriction imposed by a CPU mode that the OS uses to execute programs, and facilitated by certain build tools such as compilers and libraries.
It is not an intrinsic limitation on every program ever written.
So when we compile some simple source code, say with just a printf() function, and the compilation produces the executable machine code, will each instruction in this machine code be directly executed from memory (once the code is loaded into memory by the OS) or will each command in the machine code still need to go through the OS (kernel) to be executed?
Every instruction is executed by the CPU.
An instruction that is unsupported or illegal (e.g. process has insufficient privilege) will cause an immediate exception, and the CPU will instead execute a routine to handle this unusual condition.
A printf() function should not be used as an example of "simple source code".
The translation from an object-oriented high-level programming language to machine code may not be as trivial as you imply.
And then you choose one of the most complex functions from a runtime library that performs data conversions and I/O.
Note that your question stipulates an environment with an OS (and a runtime library).
Once the system is booted, and the OS is given control of the computer, restrictions are imposed on what a program can do (e.g. I/O must be performed by the OS).
If you expect to execute a standalone program (i.e. without an OS), then you must not boot the computer to run the OS.
... what happens after the machine code is loaded into memory?
That depends on the environment.
For a standalone program, it can be executed, i.e. control is handed over by jumping to the program's start address.
For a program loaded by the OS, the program has to be dynamically linked with shared libraries it is dependent on. The OS has to create an execution space for the process that will execute the program.
Will it go through the kernel or directly talk to the processor?
Machine code is executed by the CPU.
They do not "go through the kernel", but nor do they "talk to the processor".
The machine code (consisting of op code and operands) is an instruction to the CPU that is decoded and the operation is performed.
Perhaps the next topic you should investigate is CPU modes.
Best Answer
I can't definitively answer the "kernel threads" question for Linux. For Windows, I can tell you that the "kernel threads" are simply threads created from some other kernel mode routine, running procedures that never enter user mode. When the scheduler picks a thread for execution it resumes its previous state (user or kernel, whatever that was); the CPU doesn't need to "tell the difference". The thread executes in kernel mode because that's what it was doing the last time it was executing.
In Windows these typically are created with the so-called "System" process as their parent, but they can actually be created in any process. So, in Unix they can have a parent ID of zero? i.e. belonging to no process? This actually doesn't matter unless the thread tries to use process-level resources.
As for the addresses assigned by the compiler... There are a couple of possible ways to think about this. One part of it is that the compiler really doesn't pick addresses for much of anything; almost everything a compiler produces (in a modern environment) is in terms of offsets. A given local variable is at some offset from wherever the stack pointer will be when the routine is instantiated. (Note that stacks themselves are at dynamically assigned addresses, just like heap allocations are.) A routine entry point is at some offset from the start of the code section it's in. Etc.
The second part of the answer is that addresses, such as they are, are assigned by the linker, not the compiler. Which really just defers the question - how can it do this? By which I guess you mean, how does it know what addresses will be available at runtime? The answer is "practically all of them."
Remember that every process starts out as an almost completely blank slate, with a new instantiation of user mode address space. e.g. every process has its own instance of 0x10000. So aside from having to avoid a few things that are at well-known (to the linker, anyway) locations within each process on the platform, the linker is free to put things where it wants them within the process address space. It doesn't have to know or care where anything else already is.
The third part is that nearly everything (except those OS-defined things that are at well-known addresses) can be moved to different addresses at run time, due to Address Space Layout Randomization, which exists on both Windows and Linux (Linux released it first, in fact). So it doesn't actually matter where the linker put things.