Linux – Why does the Linux kernel build system use incremental linking or ar T thin archives

linux-kernel

While studying the kernel build system, I noticed that before v4.19 the kernel was using incremental linking (ld -r) and then it moved to thin archives (ar T) as shown at: What is the difference between the following kernel Makefile terms: vmLinux, vmlinuz, vmlinux.bin, zimage & bzimage? I noticed

Then, I tried to make a synthetic incremental linking benchmark to see if the link speedup was considerable at: https://stackoverflow.com/questions/29391965/what-is-partial-linking-in-gnu-linker/53959624#53959624 but it wasn't for my benchmark.

Therefore, my question is: why does the kernel use incremental linking or thin archives?

Is it to speed up the build or for some other reason?

Which commit introduced incremental linking? With that I would be able to figure out the rationale from git log. I found the one that moved to thin archives with git log --grep 'thin archive' (a5967db9af51a84f5e181600954714a9e4c69f1f), but could not easily grep the incremental linking one.

If it exists to speed up the build, is there a way to quickly test out link with vs without incremental linking to see the speedup?

Best Answer

Reason for thin archives

I've pinged Nicholas Piggin by email, who is one of the authors of the patches, and he explained that the thin archives are not just to reduce disk usage, but they can also prevent a link failure.

The problem is that the incremental linked object files could get so large, that the linker cannot insert even trampoline relocations, which must point to generated code that goes between the objects.

I didn't get a reply for the rationale for the incremental builds yet.

This is his awesome reply:

It's a pretty long answer depending on how much you know. There are a few reasons. Stephen's primary motivation for the patch was to allow very large kernels to link successfully.

Some other benefits are:

It is a "nicer" way to store the intermediate build artifacts, you keep the output code in a single place and track them with references (thin archives) until it's all linked together. So there is less IO and disk space required, particularly with big builds and debug info.

For the average modern workstation just building to a small number of output directories, and Linux is not really a huge project, this will all be in cache and the time to incremental link files is very fast. So build speed benefit is usually pretty small for Linux.

It allows the linker to generate slightly better code. By rearranging files and locating linker stubs more optimally.

It tends to work much better with LTO builds, although there's not much support for LTO builds upstream yet.

But we'll get back to the primary motivation.

When you build a relocatable object file that hasn't been finally linked, you have a blob of code with a bunch of references to symbols for functions and variables that are defined elsewhere.
--- a.S ---
bl      myfunc
---
assembles into
a.o:     file format elf64-powerpcle
Disassembly of section .text:
0000000000000000 <.text>:
   0:   01 00 00 48     bl      0x0
So the code has a branch to NIA+0 (i.e., itself), which is not what we asked for. Dumping relocations shows the missing bit:

Disassembly of section .text:
0000000000000000 <.text>:
   0:   01 00 00 48     bl      0x0
                    0: R_PPC64_REL24        myfunc
The relocation is not in the .text section, it's not code, but it is some ELF metadata which says the instruction at this location has a 24-bit relative offset to a symbol called myfunc.

When you do a "final link" of objects together, the files are basically concatenated together, and these relocations are resolved by adjusting code and data to point to the correct locations.

Linking a.S with b.S that contains myfunc symbol gives this:
c:     file format elf64-powerpcle


Disassembly of section .text:

00000000100000d8 <_start>:
    100000d8:   05 00 00 48     bl      100000dc <myfunc>

00000000100000dc <myfunc>:
    100000dc:   01 00 63 38     addi    r3,r3,1
    100000e0:   20 00 80 4e     blr
Relocation metadata is stripped, branch points to correct offset.

So the linker actually adjusts instructions as it links. It goes one further than that, it generates instructions. If you have a big build and this branch cannot reach myfunc with a 24-bit offset, the linker will put a trampoline (aka stub aka PLT aka procedure linkage table) into the code which can be reached in 24-bits, then the trampoline uses a longer branch that can reach the target.

The linker can't just put these trampolines anywhere in the code, because if you add something in the middle of code, that breaks relative references that go across the middle. The linker does not know about all references in a .o file, only the unresolved ones. So the linker must only put trampolines between .o files when it links them together, before it resolves their references.

The old incremental build approach just combines .o files into bigger .o files as you get closer to the root of the build directory. So you run into a problem when your .o files become so large that the branch can not reach outside its own .o file in order to reach a trampoline. There is no way to resolve this reference.

With thin archives, the final link is done on thousands of very small .o files. This gives the linker maximum flexibility to place these trampolines, which means you never encounter this limitation.

Tips for searching

If you drill in enough to the lkml.org site you'll find a search box, like here for example:

https://lkml.org/lkml/2013/1/1

Additionally I would suggest leveraging the power of Google to help with this. Most of these types of sites suck in comparison as to what you can search with using Google.

For example:

Put this in your search bar if you want to find everything on lkml.org related to NFS!

site:https://lkml.org/ nfs

Best Answer

Related Solutions

Linux – Does Android really use the same kernel as Linux

Linux – How to search the linux kernel mailing list archives

Tips for searching

Related Question