Is `–reflink=auto` safe to set as default for cp

btrfscpdeduplication

I am currently a user of BTRFS and would like to take advantage of the CoW such that when files are copied on the same btrfs filesystem, they are automatically deduplicated by reusing the existing extent. There are two ways I can think to do this:

Solution one (Local)

I could simply set an alias in my .bashrc so that whenever I call cp it automatically appends the --reflink=auto flag.

alias cp='cp --reflink=auto'

Solution two (Global)

The other solution I can think of would be to create /usr/local/bin/cp that has a higher precedence in the PATH variable. The script would be something along the lines of:

#!/bin/sh

CP=/bin/cp

exec $CP --reflink=auto $*

I do not think it would be a good idea to replace /bin/cp as updates of coreutils will end up overwriting my changes. This would however hopefully mean that applications that call cp from the PATH (rather than directly through /bin/cp) will always automatically use reflinks.

Question

Is there any argument against this, or any situations where having this imposed would cause a problem? I assume by having it set to auto it will automatically determine if the underlying file systems support reflinks and if they are on the same device, use reflinks meaning that there won't be a problem when I connect an external ext4 filesystem or am copying between btrfs filesystems?

I have read Why is cp –reflink=auto not the default behaviour? and it would seem the main argument is that cp may be used to create a backup of a file but then I would argue that for me, I would rather be able to consume less space locally and have the data duplicated to another machine completely, where I am aiming to backup data. In this case, would implemented solution 2 be safe to do?

In terms of minimising the local disk space usage, I have seen the suggestion for setting --sparse=always so I suppose a similar question applies for this.

Best Answer

Note that there's a problem in your code. Leaving $* unquoted never makes sense. $* is the concatenation of the positional parameters with the first character of $IFS. And then, though there are some variations in behaviour when IFS is empty, that is then subject to word splitting and filename generation. Here, you want:

#!/bin/sh -
exec /bin/cp --reflink=auto "$@"

"$@" expands to all the positional parameters as separate arguments.

If you want to update /bin/cp and the change to be preserved upon updates, then most systems will have a canonical way to do that. On Debian and derivatives, you'd do:

$ sudo dpkg-divert --local --rename /bin/cp
Adding 'local diversion of /bin/cp to /bin/cp.distrib'

Then write /bin/cp as:

#! /bin/sh -
exec "$0.distrib" --reflink=auto "$@"

Every update of coreutils will update cp.distrib instead of cp.

Note that there's a performance implication in that it needs to load and run sh before running cp. That's not as bad on Debian where /bin/sh is based on dash.

That also means error messages and help messages will mention cp.distrib instead of cp:

$ cp
/bin/cp.distrib: missing file operand
Try '/bin/cp.distrib --help' for more information.

That last part, you can work around by writing the script as:

#! /bin/bash -
exec -a "$0" "$0.distrib" --reflink=auto "$@"

(same with ksh93 or zsh all like bash bloated shells compared to dash though).

It will not be strictly equivalent as $0 will contain the path to that script as opposed to the argv[0] cp initially received but at least it will be something like /bin/cp instead of /bin/cp.distrib.