The choice between a spinlock and another construct which causes the caller to block and relinquish control of a cpu is to a large extent governed by the time it takes to perform a context switch (save registers/state in the locking thread and restore registers/state in another thread). The time it takes and also the cache cost of doing this can be significant.
If a spinlock is being used to protect access to hardware registers or similar where any other thread that is accessing is only going to take a matter of milliseconds or less before it releases the lock then it is a much better use of cpu time to spin waiting rather than to context switch and carry on.
The kernel sees the physical memory and provides a view to the processes. If you ever wondered how a process can have a 4 GB memory space if your whole machine got only 512 MB of RAM, that's why. Each process has its own virtual memory space. The addresses in that address space are mapped either to physical pages or to swap space. If to swap space, they'll have to be swapped back into physical memory before your process can access a page to modify it.
The example from Torvalds in XQYZ's answer (DOS highmem) is not too far fetched, although I disagree about his conclusion that PAE is generally a bad thing. It solved specific problems and has its merits - but all of that is argumentative. For example the implementer of a library may not perceive the implementation as easy, while the user of that library may perceive this library as very useful and easy to use. Torvalds is an implementer, so he's bound to say what the statement says. For an end user this solves a problem and that's what the end user cares about.
For one PAE helps solve another legacy problem on 32bit machines. It allows the kernel to map the full 4 GB of memory and work around the BIOS memory hole that exists on many machines and causes a pure 32bit kernel without PAE to "see" only 3.1 or 3.2 GB of memory, despite the physical 4 GB.
Anyway, for the 64bit kernel it's a symmetrical relation between the page physical and the virtual pages (leaving swap space and other details aside). However, the PAE kernel maps between a 32bit pointer within the process' address space and a 36bit address in physical memory. More book-keeping is needed here. Keyword: "Extended Page-Table". But this is somewhat more of a programming question. This is the main difference. More book-keeping compared to a full linear address space. For PAE it's chunks of 4 GB as you mentioned.
Aside from that both PAE and 64bit allow for large pages (instead of the standard 4 KB pages in 32bit).
Chapter 3 of Volume 1 of the Intel Processor Manual has some overview and Chapter 3 of Volume 3A ("Protected Mode Memory Management") has more details, if you want to read up on it.
To me it seems like this is a big
distinction that seems to be ignored
by many people.
You're right. However, the majority of people are users, not implementers. That's why they won't care. And as long as you don't require huge amounts of memory for your application, many people don't care (especially since there are compatibility layers).
Best Answer
Both manage a limited resource. I'll first describe difference between binary semaphore (mutex) and spin lock.
Spin locks perform a busy wait - i.e. it keeps running loop:
It performs very lightweight locking/unlocking but if the locking thread will be preempted by other which will try to access the same resouce the second one will simply try to acquitre resource untill it run out of it CPU quanta.
On the other hand mutex behave more like:
Hence if the thread will try to acquire blocked resource it will be suspended till it will be avaible for it. Locking/unlocking is much more heavy but the waiting is 'free' and 'fair'.
Semaphore is a lock that is allowed to be used multiple (known from initialization) number of times - for example 3 threads are allowed to simultainusly hold the resource but no more. It is used for example in producer/consumer problem or in general in queues: