Backup with Rsync – Copying One Pattern Only

backuprsyncwildcards

I am trying to create a directory that will house all and only my PDFs compiled from LaTeX. I like keeping each project in a separate folder, all housed in a big folder called LaTeX. So I tried running:

rsync -avn *.pdf ~/LaTeX/ ~/Output/

which should find all the pdfs in ~/LaTeX/ and transfer them to the output folder. This doesn't work. It tells me it's found no matches for "*.pdf". If I leave out this filter, the command lists all the files in all the project folders under LaTeX. So it's a problem with the *.pdf filter. I tried replacing ~/ with the full path to my home directory, but that didn't have an effect.

I'm, using zsh. I tried doing the same thing in bash and even with the filter that listed every single file in every subdirectory… What's going on here?

Why isn't rsync understanding my pdf only filter?


OK. So update: No I'm trying

rsync -avn --include="*/" --include="*.pdf" LaTeX/ Output/

And this gives me the whole file list. I guess because everything matches the first pattern…

Best Answer

TL,DR:

rsync -am --include='*.pdf' --include='*/' --exclude='*' ~/LaTeX/ ~/Output/

Rsync copies the source(s) to the destination. If you pass *.pdf as sources, the shell expands this to the list of files with the .pdf extension in the current directory. No recursive traversal happens because you didn't pass any directory as a source.

So you need to run rsync -a ~/LaTeX/ ~/Output/, but with a filter to tell rsync to copy .pdf files only. Rsync's filter rules can seem daunting when you read the manual, but you can construct many examples with just a few simple rules.

  • Inclusions and exclusions:

    • Excluding files by name or by location is easy: --exclude=*~, --exclude=/some/relative/location (relative to the source argument, e.g. this excludes ~/LaTeX/some/relative/location).
    • If you only want to match a few files or locations, include them, include every directory leading to them (for example with --include=*/), then exclude the rest with --exclude='*'. This is because:
    • If you exclude a directory, this excludes everything below it. The excluded files won't be considered at all.
    • If you include a directory, this doesn't automatically include its contents. In recent versions, --include='directory/***' will do that.
    • For each file, the first matching rule applies (and anything never matched is included).
  • Patterns:

    • If a pattern doesn't contain a /, it applies to the file name sans directory.
    • If a pattern ends with /, it applies to directories only.
    • If a pattern starts with /, it applies to the whole path from the directory that was passed as an argument to rsync.
    • * any substring of a single directory component (i.e. never matches /); ** matches any path substring.
  • If a source argument ends with a /, its contents are copied (rsync -r a/ b creates b/foo for every a/foo). Otherwise the directory itself is copied (rsync -r a b creates b/a).


Thus here we need to include *.pdf, include directories containing them, and exclude everything else.

rsync -a --include='*.pdf' --include='*/' --exclude='*' ~/LaTeX/ ~/Output/

Note that this copies all directories, even the ones that contain no matching file or subdirectory containing one. This can be avoided with the --prune-empty-dirs option (it's not a universal solution since you then can't copy a directory even by matching it explicitly, but that's a rare requirement).

rsync -am --include='*.pdf' --include='*/' --exclude='*' ~/LaTeX/ ~/Output/
Related Question