Better way than cp to copy millions of files while preserving hard links

cpfile-copyhard link

So in this story on the GNU coreutils ML, someone used cp to copy 430 million files and needed to preserve hard links, and just barely got it to finish after 10 days.

The big problem was that, in order to preserve hard links, cp has to keep a hashtable of already copied files, which took up 17GB of memory towards the end and had the system thrashing on swap.

Is there some utility that would have handled the task better?

Best Answer

If the tar or rsync solutions fails and if the directory is the root of a filesystem you can use the old dump/restore backup utilities (yes that stills works).

dump duplicates the filesystem characteristics without going through the kernel filesystem interface so it is quite fast.

The inconvenient is that dump is sensible to modifications made on the source file system while copying. So better umount the filesystem or remount it read only or stop any application that could access it before starting a copy. If you respect that condition the copy is reliable.

Depending on the filesystem type the dump command name can change, for instance, you can have the xfsdump for the XFS.

The following command is similar to the tar example :

dump 0uf - /dev/sdaX  | (cd /target && restore rf -)

The number is the incremental copy level; 0 indicates to do a full copy.

Related Question