Explaining performance irregularities when unpacking .tar.gz archives

archivingmactarwindows 10

This question is about the performance of unpacking, not "how to" unpack.
I notices this just a few days ago, when I started extracting several .tar.gz (will refer to them as archives now) on my rather powerful Windows 10 machine. Unpacking a 25 GB archive took more than an entire day. The content was mostly mp3 audio files of some 2-digit KB sizes, so I had plenty of files to unpack. My Windows machine features a new i7 Processor, 16 GB of RAM and I was unpacking from a 1TB NVMe to an external 500 GB SSD connected via USB-C, which is usually quite fast.

Just for the sake of curiosity I tried unpacking the same archive on my MacBook, which has smaller specs (an older i5 Processor, 16 GB RAM and 500 GB internal NVMe). For some reason, unpacking took only around 45 minutes. I was shocked about this huge difference, for which I honestly could not come up with any explanation. Searching the web, people stated that this would be caused by a slower hard-drive (which is likely not true in my case, I tried unpacking on Windows from my internal NVMe to itself, which was also extremely slow).

How can one explain this large difference? I suspect it has to do with the file systems, but I cannot make sense of it.

Best Answer

It seems that Windows Defender will scan each new file whenever the program issues a "close" call – which happens synchronously, i.e. the call will not return and the thread will not continue until Defender completes the scan. This was discussed last year in a Linux.conf.au talk:

File system and OS kernel design may also be a factor; since each new file needs its metadata to be written and directory metadata to be updated, operation latency matters more than just the raw disk throughput. (For example, when writing a large file the OS can just issue a bunch of writes at once and collect the results later; but metadata updates might involve reading some data and waiting for the read to complete before it can be written back.)

Related Question