Why is it quicker to switch between threads than to switch between processes and share data between them? For example, in the apache web server running on ubuntu can use either the Prefork MPM (spawning multiple child processes) or Worker MPM (creating multiple threads for a single process).
Linux – Why is it quicker to switch between threads than to switch between processes and share data between them
linuxprocessthreads
Related Solutions
I am not sure how to do it from command line, but I wrote this to do some filtering of OS related processes from PowerShell. Maybe it will give you an idea. It skips items owned by service, system and null.
gwmi win32_process |select ProcessID,ParentProcessID,Name, @{l="Username";e={$_.getowner().user}}|where {$_.Username -ne "SYSTEM"} | where {$_.Username -ne "LOCAL SERVICE"} | where {$_.Username -ne "NETWORK SERVICE"} | where {$_.Username -ne $null} |Sort-Object ProcessID | ft -AutoSize
#
Output
ProcessID ParentProcessID Name Username
--------- --------------- ---- --------
2136 3460 notepad.exe KNUCKLE-DRAGGER
2504 3460 firefox.exe KNUCKLE-DRAGGER
2792 700 dllhost.exe KNUCKLE-DRAGGER
2816 4232 conhost.exe KNUCKLE-DRAGGER
2916 3460 powershell.exe KNUCKLE-DRAGGER
3128 3460 notepad.exe KNUCKLE-DRAGGER
3180 576 taskhost.exe KNUCKLE-DRAGGER
3196 4308 vmware-tray.exe KNUCKLE-DRAGGER
3460 4392 explorer.exe KNUCKLE-DRAGGER
3644 4636 vmware-vmx.exe KNUCKLE-DRAGGER
3696 3460 mplayerc.exe KNUCKLE-DRAGGER
4636 3196 vmware.exe KNUCKLE-DRAGGER
4828 3460 notepad.exe KNUCKLE-DRAGGER
As another user commented, it's mostly OS-dependent.
if a CPU has 2 logical cores, it can run two programs 100% concurrent, yes?
Concurrently yes, in parallel no. See: https://softwareengineering.stackexchange.com/questions/190719/the-difference-between-concurrent-and-parallel-execution
For example, say I have 100 processes running on 2 cores ... will the OS try and divide 50 on each core for load balance? Will they be randomly scattered?
Each OS has it's own scheduling algorithm.
Say I launch mspaint.exe on a quad-core Intel chip ... where will it be executed from (core 1, 2, 3, 4?), and will it continue executing there until close?
We don't know where it will be executed and it will most probably not continue executing from start to finish on the same core. Again, depends on the OS scheduler.
Is it truly possible to pick a specific core, or program for multi-cores directly without having a transparent daemon or the OS doing it randomly for you?
Apparently yes: https://stackoverflow.com/questions/663958/how-to-control-which-core-a-process-runs-on
How so, if all people say is "just use threads"? Is using multi-threads mapped to cores? If so, how is using a thread tailored to a core without OS intervention if threads on a single-core do not concurrently work?
I didn't understand the question here, but the basic idea with threads is that you create them and the OS runs using its scheduling algorithm, there's no need for you to control in which logical or physical core it will run (there may be cases you might want to do that, I'm not sure why).
Best Answer
Quickie, incomplete explanation:
In a thread switch, there's a lot less context you need to save/load. Specifically, memory is shared. The kernel doesn't have to do any page outs of dirty pages, and VM tricks to pull in all the memory for a new process (though some specific pages may need to be pulled in). There are other process specific data structures in the kernel (say, the open file descriptor table) that don't need to be swapped out.
As a side effect, you're also much more likely to be able to use what's in the processor's caches at that point. a new process probably needs to start a cold cache.
Yes, there are tools to share memory (IPC shared memory, pipes) between processes, but none are as clean/easy as the common memory in a process. You can't grow a shared memory block like you can with realloc() and such. Anything besides shared memory means you need multiple copies of data structures, one in each process, with tricks to copy changes as needed.
Specifically, apache has multiple models for different OSes. The original model was pre-fork, giving process isolation in exchange for some heavyweight process switching. This worked fine with the UNIXes that were common at the time of it's first writing, where some didn't have threads at all. Windows was so bad with multi-process that apache had to do threading - processes on Windows (at least at the time of apache 1.2/2.0) were too heavyweight. Linux has very light processes, where switching is close to thread switching time, so it stayed pre-Fork usually. Solaris has a complex thread "LWP" model, and does best in a hybrid thread/fork model.