Linux – Why are Linux system call numbers in x86 and x86_64 different

linuxsystem-calls

I know that the system call interface is implemented on a low level and hence architecture/platform dependent, not "generic" code.

Yet, I cannot clearly see the reason why system calls in Linux 32-bit x86 kernels have numbers that are not kept the same in the similar architecture Linux 64-bit x86_64? What is the motivation/reason behind this decision?

My first guess has been that a backgrounding reason has been to keep 32-bit applications runnable on a x86_64 system, so that via an reasonable offset to the system call number the system would know that user-space is 32-bit or 64-bit respectively. This is however not the case. At least it seems to me that read() being system call number 0 in x86_64 cannot be aligned with this thought.

Another guess has been that changing the system call numbers might have a security/hardening background, something I was not able to confirm myself.

Being ignorant to the challenges of implementation the architecture-dependent code parts, I still wonder how changing the system call numbers, when there seems no need (as even a 16-bit register would store largely more then the currently ~346 numbers to represent all calls), would help to achieve anything, other than break compatibility (though using the system calls through a library, libc, mitigates it).

Best Answer

As for the reasoning behind the specific numbering, which does not match any other architecture [except "x32" which is really just part of the x86_64 architecture]: In the very early days of the x86_64 support in the linux kernel, before there were any serious backwards compatibility constraints, all of the system calls were renumbered to optimize it at the cacheline usage level.

I don't know enough about kernel development to know the specific basis for these choices, but apparently there is some logic behind the choice to renumber everything with these particular numbers rather than simply copying the list from an existing architecture and remove the unused ones. It looks like the order may be based on how commonly they are called - e.g. read/write/open/close are up front. Exit and fork may seem "fundamental", but they're each called only once per process.

There may also be something going on about keeping system calls that are commonly used together within the same cache line (these values are just integers, but there's a table in the kernel with function pointers for each one, so each group of 8 system calls occupies a 64-byte cache line for that table)

Related Question