Linux – What are differences between `do_execve()` and `_start` copying the command line arguments and environment

execlinuxlinux-kernel

Understanding The Linux Kernel says that execve() calls do_execve( ) which in turn

copies the file pathname, command-line arguments, and environment strings
into one or more newly allocated page frames.
(Eventually, they are assigned to
the User Mode address space.)

Am I correct that after execve() terminates with success, the process invokes _start routine of rt0.o?

According to APUE:

When a C program is executed by the kernel—by one of the exec functions, a special start-up routine is called before the main
function is called. The executable program file specifies this routine as the starting address for the program; this is set up by the link editor when it is invoked by the C compiler. This start-up routine takes values from the kernel—the command-line arguments and the environment — and sets things up so that the main function is called as shown earlier.

Does the __start routine also copy command line arguments and the environment again?

What are differences between do_execve() and _start both copying the command line arguments and environment? Isn't it wasteful to copy twice?

Thanks.

Best Answer

Am I correct that after execve() terminates with success, the process invokes _start routine of crt0.o?

Not necessarily. When the execve system call returns, the process will continue executing from whatever text/code address is the entry point of the binary (in ELF, that's the e_entry field from the header). Example:

echo 'void run(void){ printf("in run\n"); exit(0); }' |
   gcc -Wl,-e,run -nostartfiles -include stdio.h -include stdlib.h -Wall -x c - -o /tmp/run
/tmp/run
in run

_start is simply the usual name of the entry point routine on many (most?) Unix system.

Does the _start routine also copy command line arguments and the environment again?

It could do that, but usually it does no such thing. The only thing that it should do is rearrange them in a way that they could be passed to a C function like main.

The problem is that you couldn't simply declare the entry point as a C function

_start(argc, ...)

and get the arguments with va_args, because eg. on x86_64, the C calling convention expects the (first couple of) arguments to be passed in registers, and that's not how they're passed to _start.

There are other things that _start is usually doing before calling main; a very important thing is running the static constructors, which is required by programs written in C++, but could be used by any program if it defines the correct section attributes in the ELF binary (with gcc, you can do that in a C program by defining a function with __attribute__((constructor))).

The standard startup code in a glibc-based system will also go through a function defined in the (dynamically-linked) libc.so -- __libc_start_main(), which is very nice as you could override it from a preloaded dynamic library and add your own initialization stuff without having to modify a binary. Look here for an example.

Related Question