Windows – Why do file transfers between drives use RAM

file-transferhard drivememoryssdwindows

I've noticed that whenever I copy or move large files from my SSD, which I use as the system drive, to my HDD or to an external hard or flash drive, the speed graph shown by Windows always looks the same: the transfer speed starts at around 450 MB/s, and after a few seconds drops down to somewhere between 90 and 130 MB/s and remains stable until the end of the copy/move operation.

Transfer speed graph

This sparked my curiosity, so I decided to figure out what is the cause of this. Some of my thoughts were these:

Maybe that is the actual speed at which the transfer happens

Doubtfully. While the 450 MB/s speed matches the rated speed of my SSD, considering I also have some other disk reads/writes going on in the background, there is no way a 7200 rpm hard drive is capable of keeping up with it, as the 130 MB/s speed I get later on is also the most I can expect from it. So, where does the extra data go?

The extra data is being stored in the hard drive's cache memory

This makes a bit more sense, but if I take into account the duration of the higher transfer speed, my hard drive's cache would have to be over 3 GB in size, which it definitely isn't. What else could it be?

The extra data is being stored in the RAM

This makes sense. My RAM is the only other part of my system which can match the speed of my SSD, and I have plenty of it. Let's check this theory!

I open up the Task Manager, and take a look at the Performance tab. Memory usage is stable at 3.7 GB. Then I start another 15 GB file transfer. Memory usage starts rising, and stops at 5.3 GB just as the transfer speed drops to 130 MB/s. It remains the same until the end of the file transfer(the transfer dialog closes), and then slowly drops back to the 3.7 GB level it was before the transfer.

So, my last theory is true. Further confirmation is the fact that the extra used memory is marked as Modified

Modified.

What's the point?

My question is, what is the purpose of doing this? While I don't mind having some of my RAM used by file transfers, as even during the heaviest of my multitasking sessions I've never seen its usage go over 70%, what is the benefit of storing 1.6 GB of data which you won't be doing any kind of processing on in your RAM?

I don't see any benefit from the data integrity standpoint, as you're merely copying the files, and in the case of a power failure neither the RAM or the HDD will be particularly successful in retaining the data in transfer.

I could see the benefit being that the source disk(the SSD) is quickly freed up, so that if another process needs to perform lots of read/write operations on it, it can do so without the file transfer impeding it, but if that is the case, why not go ahead and load all 15 GB at max speed into the memory?

Also, this process misleads the user, as the file transfer keeps going even after the transfer dialog closes, because some of the data is still being copied from the memory to the hard drive. This could cause a user to plug out a removable drive while data is still being written to it, possibly corrupting the removable drive, cause not everyone bothers with safely removing hardware.

Keep in mind I haven't thoroughly tested this with removable drives, as Windows might be handling them differently, making my last point invalid.

Best Answer

Windows memory management is a complex thing. As you see it has different behavior with different devices.

The different operating systems has different memory management.

Your question was very interesting. I am sharing a MSDN page which explains a part of the memory management in windows and more specifically "Mapped Files"

It's documentation for software developers, but Windows is software too.

One advantage to using MMF I/O is that the system performs all data transfers for it in 4K pages of data. Internally all pages of memory are managed by the virtual-memory manager (VMM). It decides when a page should be paged to disk, which pages are to be freed for use by other applications, and how many pages each application can have out of the entire allotment of physical memory. Since the VMM performs all disk I/O in the same manner—reading or writing memory one page at a time—it has been optimized to make it as fast as possible. Limiting the disk read and write instructions to sequences of 4K pages means that several smaller reads or writes are effectively cached into one larger operation, reducing the number of times the hard disk read/write head moves. Reading and writing pages of memory at a time is sometimes referred to as paging and is common to virtual-memory management operating systems.

Unfortunately we can't easy figure how Microsoft implements the Read/Write - it isn't open source.
But we know that it has very different situations:

From      To
==================
SSD       HDD
HDD       Busy SSD ??
NTFS      FAT
NTFS      ext4
Network   HDD
IDE0slave IDE0master // IDE cable support disk to disk transfer.
IDE       SATA // in this case you have separated device controllers.

You get the point... A hdd may be bussy, the file systems may be different (or may be the same)...

For example: dd command in linux copying data "byte by byte" - It's extremely fast (because the heads of both HDDs moving sync), but if the file systems are different (with different block sizes for example) - the copied data will not be readable because the file system has different structure.

We know the RAM is much much faster than HDD. So if we have to do some data parsing (to fit the output file system) it will be better to have this data in the RAM.

Also imagine you coping the file directly from-to.
What's happening if you overload the source with other data flows? What about the destination?
What if you almost doesn't have free RAM in this moment?
...

Only Microsoft engineers know.

Related Question