Linux – rsync using regex to include only some files

backuplinuxregexrsyncshell

I am trying to run rsync to copy some files recursively down a path based on their file name pattern, case insensitive. This is what I have done to run rsync:

$ rsync -avvz --include ='*/' --include='.*[Nn][Aa][Mm][E].*' --exclude='*' ./a/ ./b/

Nothing gets copied, the debug output shows:

[sender] hiding file 1Name.txt because of pattern *
[sender] hiding file 1.txt because of pattern *
[sender] hiding file 2.txt because of pattern *
[sender] hiding file Name1.txt because of pattern *
[sender] hiding directory test1 because of pattern *
[sender] hiding file NaMe.txt because of pattern *

I have tried using: --include='*[Nn][Aa][Mm][E]*'and other combinations but it still doesn't go.

Any ideas on how to use regex to include some files?

Best Answer

rsync doesn't speak regex. You can enlist find and grep, though it gets a little arcane. To find the target files:

find a/ |
grep -i 'name'

But they're all prefixed with "a/" - which makes sense, but what we want to end up with is a list of include patterns acceptable to rsync, and as the "a/" prefix doesn't work for rsync I'll remove it with cut:

find . |
grep -i 'name' |
cut -d / -f 2-

There's still a problem - we'll still miss files in subdirectories, because rsync doesn't search directories in the exclude list. I'm going to use awk to add the subdirectories of any matching files to the list of include patterns:

find a/ |
grep -i 'name' |
cut -d / -f 2- |
awk -F/ '{print; while(/\//) {sub("/[^/]*$", ""); print}}'

All that's left is to send the list to rsync - we can use the argument --include-from=- to provide a list of patterns to rsync on standard input. So, altogether:

find a/ |
grep -i 'name' |
cut -d / -f 2- |
awk -F/ '{print; while(/\//) {sub("/[^/]*$", ""); print}}' |
rsync -avvz --include-from=- --exclude='*' ./a/ ./b/

Note that the source directory 'a' is referred to via two different paths - "a/" and "./a/". This is subtle but important. To make things more consistent I'm going to make one final change, and always refer to the source directory as "./a/". However, this means the cut command has to change as there will be an extra "./" on the front of the results from find:

find ./a/ |
grep -i 'name' |
cut -d / -f 3- |
awk -F/ '{print; while(/\//) {sub("/[^/]*$", ""); print}}' |
rsync -avvz --include-from=- --exclude='*' ./a/ ./b/
Related Question