Linux – Performance of calling POSIX-specified functions versus direct Linux kernel calls

glibclinuxperformanceposixsyscalls

In an answer over on Stack Overflow, I provided a code sample to perform some small task referenced in the question. The original question had to do with the fastest-performing technique (so performance criteria are in play, here).

Another commenter/answerer suggested that making a POSIX-defined system API call (in this case, readdir) was not as fast as making a direct system call into the kernel (syscall(SYS_getdents,...)) and the claimed performance difference is in the 25% range. (I didn't implement and re-benchmark; I believe that the performance could in fact be better.)

My question is about the performance characteristics of the proposed syscall-based solution and why they might be faster. I can think of a few reasons why performance might be better:

  1. POSIX readdir is inherently more complicated than syscall(SYS_getdents,...)/getdents()
  2. readdir (which presumably calls syscall(SYS_getdents,...) simply adds the overhead of indirection
  3. readdir only returns one record (per kernel-call) versus syscall(SYS_getdents,...)/getdents()` which returns (presumably) more than one record per kernel-call

I can't imagine that #1 above is true. readdir and getdents are so similar that the implementation of readdir in glibc simply can't have many more "true" system calls than a direct-invocation of syscall(SYS_getdents,...)/getdents() would invoke.

I can't imagine that #2 is true, either, since calling readdir likely wraps getdents and also syscall(SYS_getdents,...) likely calls getdents as well (the proposed answer specifically uses syscall(SYS_getdents,...) instead of calling getdents directly. It's possible that everything within glibc on Linux boils down to syscall(syscallid, args) in which case #2 probably is true.

The last possibility seems to me to be the best explanation: fewer calls into the kernel simply results in faster performance.

Is there any specific explanation for why a "direct kernel call" would be measurably faster than calling a POSIX-defined function?

Best Answer

Factors like PLT indirection or syscall()'s variadic-ness (registers have to be saved to memory) should play little role given that getdents is one of the most expensive calls in Linux.

Fully reading an empty directory on my machine costs about 5µs, a 100-item directory 37µs, a 1000-item directory 340µs and a 10,000-item directory 3.79ms.

What fdopendir+readdir does on top of getdents is it adds a buffer allocation/deallocation (0.1µs) and a stat check that the supplied fd is of the directory variety (0.4µs). readdir then makes one cheap call per directory entry (moves a position in the buffer and possibly refills).

So there is something like a one-time overhead of 0.5µs, which is 10% of the directory scanning time for empty directories, but only 1% for 100-item directories and negligibly little for larger ones. This overhead is 5 times less (only the allocation/deallocation cost) if you you don't need fdopen. (you only need fdopen if you can't diropen directly and therefore must go through a separately obtained (e.g., openat'ted) filedescriptor).

So if you use a custom one-time allocated buffer along with getdents, you can save 2-10% on the scanning cost of empty directories and negligibly little on the bigger ones.

As for the readdir calls, the cost of PLT indirection on modern hardware is typically less than a 1ns, function call overhead is about 1-2ns. Given that the directory scanning times are in the order of microseconds, you'd need to make at least 1000 readdir calls for these factors to translate into a single µs, but then the scanning cost is 340µs and the accrued 1 extra µs is like 0.3% of that -- negligible effect. Inlining these readdir (thus eliminating the call overhead and the PLT overhead) would only serve to expand the code, but it wouldn't much improve performance as getdents is very much the bottleneck there.

(readdir_r is more expensive because of additional locking, but you shouldn't need readdir_r because plain readdir calls are typically thread-safe unless you have multiple threads calling them on the same directory stream. POSIX might not be explicit about this yet, but I believe this guarantee should be getting standardized soon given that glibc went as far as to deprecate readdir_r.)