How to Compress Large Tar Files – Reduce Tar File Size with Gzip

compressionfilesystemsgzipsparse-filestar

I packed and compressed a folder to a .tar.gz archive.
After unpacking it was nearly twice as big.

du -sh /path/to/old/folder       = 263M
du -sh /path/to/extracted/folder = 420M

I searched a lot and found out that tar is actually causing this issue by adding metadata or doing other weird stuff with it.

I made a diff on 2 files inside the folder, as well as a md5sum. There is absolutely no diff and the checksum is the exact same value. Yet, one file is as twice as big as the original one.

root@server:~# du -sh /path/to/old/folder/subfolder/file.mcapm /path/to/extracted/folder/subfolder/file.mcapm
1.1M    /path/to/old/folder/subfolder/file.mcapm
2.4M    /path/to/extracted/folder/subfolder/file.mcapm
root@server:~# diff /path/to/old/folder/subfolder/file.mcapm /path/to/extracted/folder/subfolder/file.mcapm
root@server:~# 
root@server:~# md5sum /path/to/old/folder/subfolder/file.mcapm
root@server:~# f11787a7dd9dcaa510bb63eeaad3f2ad
root@server:~# md5sum /path/to/extracted/folder/subfolder/file.mcapm
root@server:~# f11787a7dd9dcaa510bb63eeaad3f2ad

I am not searching for different methods, but for a way to reduce the size of those files again to their original size.

How can I achieve that?

Best Answer

[this answer is assuming GNU tar and GNU cp]

There is absolutely no diff and the checksum is the exact same value. Yet, one file is as twice as big as the original one.

1.1M    /path/to/old/folder/subfolder/file.mcapm
2.4M    /path/to/extracted/folder/subfolder/file.mcapm

That .mcapm file is probably sparse. Use the -S (--sparse) tar option when creating the archive.

Example:

$ dd if=/dev/null seek=100 of=dummy
...
$ mkdir extracted

$ tar -zcf dummy.tgz dummy
$ tar -C extracted -zxf dummy.tgz
$ du -sh dummy extracted/dummy
0       dummy
52K     extracted/dummy

$ tar -S -zcf dummy.tgz dummy
$ tar -C extracted -zxf dummy.tgz
$ du -sh dummy extracted/dummy
0       dummy
0       extracted/dummy

You can also "re-sparse" a file afterwards with cp --sparse=always:

$ dd if=/dev/zero of=junk count=100
...
$ du -sh junk
52K     junk
$ cp --sparse=always junk junk.sparse && mv junk.sparse junk
$ du -sh junk
0       junk
Related Question