I've got a bunch of XML files under a directory tree which I would like to move to corresponding folders with the same name within that same directory tree.
Here is sample structure (in shell):
touch foo.xml bar.xml "[ foo ].xml" "( bar ).xml"
mkdir -p foo bar "foo/[ foo ]" "bar/( bar )"
So my approach here is:
find . -name "*.xml" -exec sh -c '
DST=$(
find . -type d -name "$(basename "{}" .xml)" -print -quit
)
[ -d "$DST" ] && mv -v "{}" "$DST/"' ';'
which gives the following output:
‘./( bar ).xml’ -> ‘./bar/( bar )/( bar ).xml’
mv: ‘./bar/( bar )/( bar ).xml’ and ‘./bar/( bar )/( bar ).xml’ are the same file
‘./bar.xml’ -> ‘./bar/bar.xml’
‘./foo.xml’ -> ‘./foo/foo.xml’
But the file with square brackets ([ foo ].xml
) hasn't been moved as if it had been ignored.
I've checked and basename
(e.g. basename "[ foo ].xml" ".xml"
) converts the file correctly, however find
has problems with brackets. For example:
find . -name '[ foo ].xml'
won't find the file correctly. However, when escaping the brackets ('\[ foo \].xml'
), it works fine, but it doesn't solve the problem, because it's part of the script and I don't know which files having those special (shell?) characters. Tested with both BSD and GNU find
.
Is there any universal way of escaping the filenames when using with find
's -name
parameter, so I can correct my command to support files with the metacharacters?
Best Answer
It's so much easier with
zsh
globs here:Or if you want to include hidden xml files and look inside hidden directories like
find
would:But beware that files called
.xml
,..xml
or...xml
would become a problem, so you may want to exclude them:With GNU tools, another approach to avoid having to scan the whole directory tree for each file would be to scan it once and look for all directories and
xml
files, record where they are and do the moving in the end:Your approach has a number of problems if you want to allow any arbitrary file name:
{}
in the shell code is always wrong. What if there's a file called$(rm -rf "$HOME").xml
for instance? The correct way is to pass those{}
as argument to the in-line shell script (-exec sh -c 'use as "$1"...' sh {} \;
).find
(implied here as you're using-quit
),*.xml
would only match files consisting of a sequence of valid characters followed by.xml
, so that excludes file names that contain invalid characters in the current locale (for instance file names in the wrong charset). The fix for that is to fix the locale toC
where every byte is a valid character (that means error messages will be displayed in English though).xml
files are of type directory or symlink, that would cause problems (affect the scanning of directories, or break symlinks when moved). You may want to add a-type f
to only move regular files.$(...)
) strips all trailing newline characters. That would cause problems with a file calledfoo.xml
for instance. Working around that is possible but a pain:base=$(basename "$1" .xml; echo .); base=${base%??}
. You can at least replacebasename
with the${var#pattern}
operators. And avoid command substitution if possible.?
,[
,*
and backslash; they are not special to the shell, but to the pattern matching (fnmatch()
) done byfind
which happens to be very similar to shell pattern matching). You'd need to escape them with a backslash..xml
,..xml
,...xml
mentioned above.So, if we address all of the above, we end up with something like:
Phew...
Now, it's not all. With
-exec ... {} +
, we run as fewsh
as possible. If we're lucky, we'll run only one, but if not, after the firstsh
invocation, we'll have moved a number ofxml
files around, and thenfind
will continue looking for more, and may very well find the files we have moved in the first round again (and most probably try to move them where they are).Other than that, it's basically the same approach as the zsh ones. A few other notable differences:
zsh
one, the file list is sorted (by directory name and file name), so the destination directory is more or less consistent and predictable. Withfind
, it's based on the raw order of files in directories.zsh
, you'll get an error message if no matching directory to move the file to is found, not with thefind
approach above.find
, you'll get error messages if some directories cannot be traversed, not with thezsh
one.A last note of warning. If the reason you get some files with dodgy file names is because the directory tree is writable by an adversary, then beware than none of the solutions above are safe if the adversary may rename files under the feet of that command.
For instance, if you're using LXDE, the attacker could make a malicious
foo/lxde-rc.xml
, create alxde-rc
folder, detect when you're running your command and replace thatlxde-rc
with a symlink to your~/.config/openbox/
during the race window (which can be made as large as necessary in many ways) betweenfind
finding thatlxde-rc
andmv
doing therename("foo/lxde-rc.xml", "lxde-rc/lxde-rc.xml")
(foo
could also be changed to that symlink making you move yourlxde-rc.xml
elsewhere).Working around that is probably impossible using standard or even GNU utilities, you'd need to write it in a proper programming language, doing some safe directory traversal and using
renameat()
system calls.All the solutions above will also fail if the directory tree is deep enough that the limit on the length of the paths given to the
rename()
system call done bymv
is reached (causingrename()
to fail withENAMETOOLONG
). A solution usingrenameat()
would also work around the problem.