How to use regex inside exec with find

findregular expression

Is it possible to use regular expressions based on the result (file name) inside the exec argument with find? I want to be able to "exec" based on parts of the argument, like:

find . -name pattern -regex "foo (regex1) bar (Regex2)" -exec something $1 $2 ;

Best Answer

You can't use capture groups from the regexp in the command to execute. If you use find -regex to restrict matches, you'll have to do some extra matching in the command. You can do that by invoking a shell and using its own pattern matching constructs. For example, if foo and bar are constant strings and regex1 can't match bar:

find … -exec sh -c '
  x=${0#foo}
  y=${x#*bar}
  x=${x%%bar*}
  something "$x" "$y"
' {} \;

Invoking a shell has a little overhead. You can improve performance a bit by invoking the shell in batches.

find … -exec sh -c '
  for item do
    item=${item#foo}
    y=${item#*bar}
    x=${item%%bar*}
    something "$x" "$y"
  done
' sh {} +

Since you've already done some filtering, you may be able to get away with shell patterns that match more than regex1 and regex2, but, for paths of that particular form, match the same part. If foo and bar can't be expressed with ordinary shell patterns, you can invoke ksh or bash, which support extra patterns that are as powerful as regular expressions: @(alter|native), *(zero-or-more), +(one-or-more), ?(optional), and !(negated). In bash, these patterns need to be enabled with shopt -s extglob. In ksh, they are available natively.

In bash, there is a regular expression matching construct which you can use in conditionals: [[ $STRING =~ REGEXP ]]. The regexp is an ERE (like find -regextype posix-egrep). (Zsh has a similar one; ksh has =~ but doesn't expose capture groups.) Capture groups are available via the BASH_REMATCH array.

find … -exec bash -c '
  for item do
    [[ item =~ foo(regex1)bar(regex2) ]]
    something "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"
  done
' bash {} +

An alternative approach is to print out the result and filter it, then call xargs to invoke the program. Arrange to have the first and second argument as successive items and run xargs -n 2. Use null bytes as separators to avoid xargs's strange quoting format, or use -d '\n' to use strict line-by-line parsing. Recent GNU tools such as sed can work with null bytes instead of newlines to separate records.

find … -print0 |
sed -z 's/^foo\(regex1\)bar\(regex2\)$/\1\x00\2/'
| xargs -n2 -0 something

An alternative approach is to ditch find and use the recursive globbing feature of ksh93, bash or zsh: **/ matches subdirectories recursively. This isn't possible for complex find expressions involving boolean connectors, but it's enough for most cases. For example, in bash (note that this recurses into symbolic links to directories, like find -L):

shopt -s extglob globstar
for x in **/*bar*; do
  if [[ item =~ foo(regex1)bar(regex2) ]]; then
    something "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"
  fi
done

In zsh:

for x in **/*bar*; do
  if [[ item =~ foo(regex1)bar(regex2) ]]; then
    something $match[1] $match[2]
  fi
done
Related Question