File Copy Btrfs – How to Verify a File Copy is Reflink/CoW?

btrfsfile-copy

I'm playing with btrfs, which allows cp --reflink to copy-on-write. Other programs, such as lxc-clone, may use this feature as well. My question is, how to tell if a file is a CoW of another? Like for hardlink, I can tell from the inode number.

Best Answer

Good question. Looks like there aren't currently any easy high-level ways to tell.

One problem is that a file may only share part of the data via Copy-on-Write. This is called a physical extent, and some or all of the physical extents may be shared between CoW files.

~~There is nothing analogous to an inode which, when compared between files, would tell you that the files share the same physical extents.~~ (Edit: see my other answer).

The low level answer is that you can ask the kernel which physical extents are used for the file using the FS_IOC_FIEMAP ioctl, which is documented in Documentation/filesystems/fiemap.txt. In principle, if all of the physical extents are the same, then the file must be sharing the same underlying storage.

Few things implement a way to look at this information at a higher level. I found some go code here. Apparently the filefrag utility is supposed to show the extents with -v. In addition, btrfs-debug-tree shows this information.

I would exercise caution however, since these things may have had little use in the wild for this purpose, you could find bugs giving you wrong answers, so beware relying on this data for deciding on operations which could cause data corruption.

Some related questions:

Related Solutions

Ssh – How to copy files from one machine to another using ssh

Syntax:

scp <source> <destination>

To copy a file from B to A while logged into B:

scp /path/to/file username@a:/path/to/destination

To copy a file from B to A while logged into A:

scp username@b:/path/to/file /path/to/destination

Is `–reflink=auto` safe to set as default for cp

Note that there's a problem in your code. Leaving $* unquoted never makes sense. $* is the concatenation of the positional parameters with the first character of $IFS. And then, though there are some variations in behaviour when IFS is empty, that is then subject to word splitting and filename generation. Here, you want:

#!/bin/sh -
exec /bin/cp --reflink=auto "$@"

"$@" expands to all the positional parameters as separate arguments.

If you want to update /bin/cp and the change to be preserved upon updates, then most systems will have a canonical way to do that. On Debian and derivatives, you'd do:

$ sudo dpkg-divert --local --rename /bin/cp
Adding 'local diversion of /bin/cp to /bin/cp.distrib'

Then write /bin/cp as:

#! /bin/sh -
exec "$0.distrib" --reflink=auto "$@"

Every update of coreutils will update cp.distrib instead of cp.

Note that there's a performance implication in that it needs to load and run sh before running cp. That's not as bad on Debian where /bin/sh is based on dash.

That also means error messages and help messages will mention cp.distrib instead of cp:

$ cp
/bin/cp.distrib: missing file operand
Try '/bin/cp.distrib --help' for more information.

That last part, you can work around by writing the script as:

#! /bin/bash -
exec -a "$0" "$0.distrib" --reflink=auto "$@"

(same with ksh93 or zsh all like bash bloated shells compared to dash though).

It will not be strictly equivalent as $0 will contain the path to that script as opposed to the argv[0] cp initially received but at least it will be something like /bin/cp instead of /bin/cp.distrib.

Best Answer

Related Solutions

Ssh – How to copy files from one machine to another using ssh

Is `–reflink=auto` safe to set as default for cp

Related Question