What makes the Linux scheduler seem unpredictable

multithreadingscheduling

The question refers to the output of a multi-threaded application, where each thread merely prints its ID (user assigned number) on the standard output. Here all threads have equal priority and compete for CPU quota in order to print on the standard output. However, running the same application a sufficiently large number of times will result in different order of IDs being printed on the screen. This order is due to the OS scheduler which is a piece of software and therefore deterministic. Nonetheless, its behavior seems non-deterministic, which brings me back to the initial question.

Best Answer

This is 100% normal with respect to threading on any and all operating systems. The documentation for your thread library, any examples and tutorials you may find, etc. are likely to make a point of this as it is often confusing to people when they are learning the ropes of threading.

Threads are by default (and by definition) not synchronized.

This means unless you make arrangements to the contrary (via locks, semaphores, whatever), do not expect threads to execute in any particular order. It does not matter what they are doing or how you think they are "likely to happen" based on what they are doing, etc. Put simply: do not think about them this way, that is not what they are or how they are meant to be used. Do not bother coming up with mentalizations such as "I am in single user mode, I am trying to still the system like water because...and..blah blah" ad infinitum, etc., it is a waste of time, there is nothing to learn that way. Water is inevitably not still, that's how liquids are. End of story. Move on and enjoy.

Executing threads that, as you say, do next to nothing on a system that you have set-up to do next to nothing will result in a light load for the scheduler, but note the CPU still runs at full speed and a lot of nothing happening quickly is not necessarily more predictable than any other scenario. The scheduler is not intended to synchronize or predictably order threads and processes. I really really really hope me saying this does not inspire you to concoct a more complicated and pseudo-logical experiment, because as already stated, that is a waste of time, move on.

You can and probably will synchronize them, but notice that is a delicate trick that first requires you understand why they do not just "do it naturally"(clue: because there is no reason for them to do so, they don't).

"What makes a deterministic OS seem non-deterministic?"

Perhaps the analogy of fluid dynamics is not so silly. Fluids seem chaotic -- smoke can even seem to be alive -- and I believe in mathematics the study of fluid dynamics and chaos theory do cross over. However, most people are probably not stunned by this ("Amazing! What could cause such behavior in natural phenomenon? Perhaps it is evidence of God!" ...no) because we can understand how the interaction of deterministic rules (the laws of fluid dynamics) viewed on a certain scale will produce results that are only crudely predictable ("Well, the water will spill out of the glass...") and hence are effectively chaotic or non-deterministic (the pattern of the splash on the table) . Even if that is "only a very small fire" or glass of water, we recognize there is the motion of literally trillions of trillions of particles involved.

If you have ever busy looped with a counter just to see how fast your computer can count, you will know that is on the order of billions of times per second. Keep in mind that most of the "minor" system events constantly going on even with an "idle" system will still be much more substantial than a thread whose activity is measured by the printing of an ID, if that is all the thread is doing. Going with our fluid analogy, even on a day that seems "completely windless" to us, smoke does still not form an orderly column (but we can see more clearly how its own movement affects it).

The scheduler is supposed to ensure that all processes run in a balanced way (gravity will make sure water will spill), however, not by giving each of them exactly 1 nanosecond at a time (which might make your experiment more predictable). This is all well and good since, once again, it is NOT the role of a scheduler to synchronize anything, it is to make everything happen in an appropriately balanced way. These two goals are not the same. When you do get into synchronization, you will appreciate how while synchronization can achieve a clear balance, it comes at the cost of efficiency.

Also, the Linux scheduler uses a red-black tree to organize its queue, it does not simply run everything around in a FILO line (I doubt any modern general purpose scheduler will do that). That implies greater complexity and hence, more apparently chaotic behaviour.

I don't want to encourage you to persist in your fixation, but if you got the threads to perform a more substantial task, one that will actually take them something resembling time to do, and start them offset by a few seconds on an otherwise idle system, then in fact you will see completely deterministic results. Just keep in mind this is not a truly effective means of synchronization -- when the wind kicks up and the water gets rough, you will see the chaos start ;)

"Do you know of any research which studies the influence of these events on the OS scheduler?"

No, because there is nothing truly mysterious about it, as in the case of fluids. The scheduler actually is completely deterministic, just not in the sense or on the scale you are considering. If you put a bunch of informative printks in the kernel code and understand the algorithm of the scheduler, you won't find anything non-deterministic -- everything will happen exactly the way it is supposed to.

Related Question