Linux Kernel – Why Never Break User Space Policy Exists

historylinux-kernel

I started thinking about this issue in the context of etiquette on the Linux Kernel Mailing list. As the world's best known and arguably most successful and important free software project, the Linux kernel gets plenty of press. And the project founder and leader, Linus Torvalds, clearly needs no introduction here.

Linus occasionally attracts controversy with his flames on the LKML. These flames are frequently, by his own admission, to do with breaking user space. Which brings me to my question.

Can I have some historical perspective on why breaking user space is such a bad thing? As I understand it, breaking user space would require fixes on the application level, but is this such a bad thing, if it improves the kernel code?

As I understand it, Linus' stated policy is that not breaking user space trumps everything else, including code quality. Why is this so important, and what are the pros and cons of such a policy?

(There are clearly some cons to such a policy, consistently applied, since Linus occasionally has "disagreements" with his top lieutenants on the LKML on exactly this topic. As far as I can tell, he always gets his way in the matter.)

Best Answer

The reason is not a historical one but a practical one. There are many many many programs that run on top of the Linux kernel; if a kernel interface breaks those programs then everybody would need to upgrade those programs.

Now it's true that most programs do not in fact depend on kernel interfaces directly (the system calls), but only on interfaces of the C standard library (C wrappers around the system calls). Oh, but which standard library? Glibc? uClibC? Dietlibc? Bionic? Musl? etc.

But there are also many programs that implement OS-specific services and depend on kernel interfaces that are not exposed by the standard library. (On Linux, many of these are offered through /proc and /sys.)

And then there are statically compiled binaries. If a kernel upgrade breaks one of these, the only solution would be to recompile them. If you have the source: Linux does support proprietary software too.

Even when the source is available, gathering it all can be a pain. Especially when you're upgrading your kernel to fix a bug with your hardware. People often upgrade their kernel independently from the rest of their system because they need the hardware support. In the words of Linus Torvalds:

Breaking user programs simply isn't acceptable. (…) We know that people use old binaries for years and years, and that making a new release doesn't mean that you can just throw that out. You can trust us.

He also explains that one reason to make this a strong rule is to avoid dependency hell where you'd not only have to upgrade another program to get some newer kernel to work, but also have to upgrade yet another program, and another, and another, because everything depends on a certain version of everything.

It's somewhat ok to have a well-defined one-way dependency. It's sad, but inevitable sometimes. (…) What is NOT ok is to have a two-way dependency. If user-space HAL code depends on a new kernel, that's ok, although I suspect users would hope that it wouldn't be "kernel of the week", but more a "kernel of the last few months" thing.

But if you have a TWO-WAY dependency, you're screwed. That means that you have to upgrade in lock-step, and that just IS NOT ACCEPTABLE. It's horrible for the user, but even more importantly, it's horrible for developers, because it means that you can't say "a bug happened" and do things like try to narrow it down with bisection or similar.

In userspace, those mutual dependencies are usually resolved by keeping different library versions around; but you only get to run one kernel, so it has to support everything people might want to do with it.

Officially,

backward compatibility for [system calls declared stable] will be guaranteed for at least 2 years.

In practice though,

Most interfaces (like syscalls) are expected to never change and always be available.

What does change more often is interfaces that are only meant to be used by hardware-related programs, in /sys. (/proc, on the other hand, which since the introduction of /sys has been reserved for non-hardware-related services, pretty much never breaks in incompatible ways.)

In summary,

breaking user space would require fixes on the application level

and that's bad because there's only one kernel, which people want to upgrade independently of the rest of their system, but there are many many applications out there with complex interdependencies. It's easier to keep the kernel stable that to keep thousands of applications up-to-date on millions of different setups.

Related Question