See rdup-simple
(from rdup). You said you wanted compression, but in case you change your mind, I strongly recommend rsnapshot.
By the way, if two hardlinks point to a file, it's the same file. You can't compress only one of the hardlinks since it's the same underlying data in the file-system.
You can use ncdu
itself!
This shows the uncompressed sizes of the files.
In the case you say you care about, namely many uncompressible files, it should reflect what you need pretty well:
To make the file sizes accessible to ncdu
, they need to be in a file system. So we need to mount the archive as a file system.
We use a fuse user-space filesystem implementation, archivemount
:
Install the fuse file system:
sudo apt-get install archivemount
mkdir
a directory, mount
the archive to it, cd
into it, and run ncdu
:
$ mkdir bash-4.3-mount
$ archivemount bash-4.3.tar.gz bash-4.3-mount
$ cd bash-4.3-mount
$ ncdu
Now you can use ncdu
just normally:
ncdu 1.10 ~ Use the arrow keys to navigate, press ? for help
--- /tmp/archivedutest/bash-4.3-mount/bash-4.3/lib ----------------
/..
1.2MiB [##########] /readline
343.0KiB [## ] /sh
316.5KiB [## ] /intl
104.5KiB [ ] /glob
97.0KiB [ ] /malloc
32.0KiB [ ] /termcap
22.0KiB [ ] /tilde
Total disk usage: 2.1MiB Apparent size: 2.0MiB Items: 251
Now, what you are really interested in is the compressed size of the files, not uncompressed: You want to see which files take up the most space in the actual archive.
Strictly speaking, that's not possible because the archive is compressed as a whole. An individual file has no "compressed size".
So the compressed size of individual files can only be approximated.
One approximation would be the size of individually compressed files.
Another would be a fraction of the compressed size assuming all files compress by the same ratio. There are certainly other ways.
The first seems to be ok. To implement it, there is no way around actually unpacking and recompressing the individual files, so I see no reason to not just do that, unpack to the filesystem, and use ncdu on the files.
Best Answer
The highest compression ratio also has some important drawbacks and is usually not recommended.
For a backup solution, it is often important to have a fast restore.
The compression ratio you are able to achieve depends on your data and the compression tool you are using.
xz
provides one of the highest compression ratios:will compress your disk device to stdout (
-c
) with the highest compression ratio (-9
) and the extreme switch (-e
). This will take a very long time.Another way to have good compression ratios and also a fast restore is using a compression-enabled file system like BTRFS, where you can store for example
rsync
backups.To mount a compressed BTRFS volume:
This is pretty convenient, because you don't need to deal with compression (it is automatically done by the filesystem) and have fast access to your backed-up data.