Why is writing speed for RAM less than reading speed? And how do caches come into the picture

bandwidthcachememorynehalem

Firstly, this is true, right? I feel that reads will always be faster than writes, also this guy here does some experiments to "prove" it. He doesn't explain why, just mentions "caching issues". (and his experiments don't seem to worry about prefetching)

But I don't understand why. If it matters, let's assume we're talking about the Nehalem architecture (like i7) which has L1, L2 cache for each core and then a shared inclusive L3 cache.

Probably this is because I don't correctly understand how reads and writes work, so I'll write my understanding. Please tell me if something is wrong.

If I read some memory, following steps should happen: (assume all cache misses)

    1. Check if already in L1 cache, miss
    2. Check if in L2 cache, miss
    3. Check if in L3 cache, miss
    4. Fetch from memory into (L1?) cache

Not sure about last step. Does data percolate down caches, meaning that in case of cache miss memory is read into L3/L2/L1 first and then read from there? Or can it "bypass" all caches and then caching happens in parallel for later. (reading = access all caches + fetch from RAM to cache + read from cache?)

Then write:

    1. All caches have to be checked (read) in this case too
    2. If there's a hit, write there and since Nehalem has write through caches, 
write to memory immediately and in parallel
    3. If all caches miss, write to memory directly?

Again not sure about last step. Can write be done "bypassing" all caches or writing involves always reading into the cache first, modifying the cached copy and letting the write-through hardware actually write to memory location in RAM? (writing = read all caches + fetch from RAM to cache + write to cache, written to RAM in parallel ==> writing is almost a superset of reading?)

Best Answer

Memory must store its bits in two states which have a large energy barrier between them, or else the smallest influence would change the bit. But when writing to that memory, we must actively overcome that energy barrier.

Overcoming the energy barrier in RAM requires waiting while energy is moved around. Simply looking to see what the bit is set to takes less time.

For more detail, see MSalters excellent answer to a somewhat similar question.

I'm not certain enough of the details of how caching interacts with RAM to answer that part of the question with any authority, so I'll leave it to someone else.

Related Question