As another user commented, it's mostly OS-dependent.
if a CPU has 2 logical cores, it can run two programs 100% concurrent,
yes?
Concurrently yes, in parallel no. See: https://softwareengineering.stackexchange.com/questions/190719/the-difference-between-concurrent-and-parallel-execution
For example, say I have 100 processes running on 2 cores ... will the
OS try and divide 50 on each core for load balance? Will they be
randomly scattered?
Each OS has it's own scheduling algorithm.
Say I launch mspaint.exe on a quad-core Intel chip ... where will it
be executed from (core 1, 2, 3, 4?), and will it continue executing
there until close?
We don't know where it will be executed and it will most probably not continue executing from start to finish on the same core. Again, depends on the OS scheduler.
Is it truly possible to pick a specific core, or program for
multi-cores directly without having a transparent daemon or the OS
doing it randomly for you?
Apparently yes: https://stackoverflow.com/questions/663958/how-to-control-which-core-a-process-runs-on
How so, if all people say is "just use
threads"? Is using multi-threads mapped to cores? If so, how is using
a thread tailored to a core without OS intervention if threads on a
single-core do not concurrently work?
I didn't understand the question here, but the basic idea with threads is that you create them and the OS runs using its scheduling algorithm, there's no need for you to control in which logical or physical core it will run (there may be cases you might want to do that, I'm not sure why).
A CPU is a much more general purpose machine than a GPU. We might talk about using a GPU as a "general purpose" GPU, but they have different strengths.
CPU cores are capable of a wide variety of operations and deal with (what can for all intents be considered to be) a random branching instruction stream. Multiple programs all vying for time on the processor and being controlled by the operating system. They cache and predict as much as they can while still trying to remain capable of dealing with sudden changes in the instruction stream.
GPUs on the other hand are processors designed to deal with data streams. Their processors are designed to work with a small series of instructions (a shader program) across a potentially vast amount of data. HD, 2k and 4k screens contain a huge number of pixels, and a shader must run programs across every pixel in successive runs to achieve particular effects. To that end their programs are (compared to a CPU) smaller, their per-core caches similarly smaller, but their bandwidth to memory phenomenally faster.
They might, with suitable programming, be able to achieve the same tasks, but the focus of instructions vs data processing is what separates a CPU from a GPU.
As such their cores are designed to work to those strengths. For a long while GPU shader cores have operated around 1-2GHz (modern intel graphics cores list their speeds as 500MHz to 1.5GHz) while CPUs have been anywhere between 1.5 and 4GHz and more.
Instruction processing benefits more from speed of individual units because it can be difficult or impossible to break an instruction stream down into multiple streams, hence CPUs need to be faster to deal with instructions quicker. The problem is that the faster you run a core the more heat it generates so you hit a limit in how fast you can run it. (There are other technical limitations that affect clock speed but that's something for another story.)
Data processing on the other hand lends itself to running the same task (program) on different data sets and parallelism, hence the more cores you can throw at the task the better. Running cores at a slower speed generates less heat. Less heat means you can put in more cores therefore better throughput of data. Hence data tasks benefit from a different (smaller, leaner) type of core to a CPU.
The end result is that we have two distinct types of processor. One is aimed at general purpose instruction streams, and another that is aimed at bulk data handling.
Best Answer
Your i5 has two cores, each core can run two threads because of intel's hyperthreading, making 4 threads, beyond that it switches at high speeds between processes. Here's a nice explation of multithreading if you want to know more, but in essence your CPU can run 4 processes simultaniously, and switch at high speed between processes.