Sed Command – How to Skip File if It Contains Regex

newlinessedshelltext processingtext;

I currently use the following simplified command to remove trailing whitespace and add a newline at end of file where needed:

find . -type f -exec sed -i -e 's/[ \t]\+\(\r\?\)$/\1/;$a\' {} \+

As you'll quickly see, this has two problems: It will change binary files and it will add a newline to the end of files with ␍␊ line separators. These modifications are easy to undo or skip when committing in git gui or the like, but I'd like to minimize* the amount of reverting. To that end:

Is there a way to skip the whole file if any line matches a regex in sed?

* I'm aware that there might be binary files without ␀ characters, and there could be files with deliberately mixed newlines or ␀s. But I'm looking for the solution which requires the minimal human intervention. I could conceivably list all the file extensions that I'd like to operate on, but it would be a very long list which would have to be constantly reviewed, and because of name clashes it would still be possible that binary files slip through.

Complicated workaround:

while IFS= read -r -d '' -u 9
do
    if [[ "$(file -bs --mime-type -- "$REPLY")" = text/* ]]
    then
        sed -i -e 's/[ \t]\+\(\r\?\)$/\1/;$a\' -- "$REPLY"
    else
        echo "Skipping $REPLY" >&2
    fi
done 9< <(find . -type f -print0)

Best Answer

If you trust git's point of view on what is a binary file or not, you can use git grep to get a list of non-binary files. Assuming t.cpp is a text file, and ls is a binary, both checked in:

$ ls
t.cpp ls
$ git grep -I --name-only -e ''
t.cpp

The -I option means:

-I
Don't match the pattern in binary files.

To combine that with your sed expression:

$ git grep -I --name-only -z -e '' | \
       xargs -0 sed -i.bk -e 's/[ \t]\+\(\r\?\)$/\1/;$a\'

(-z / xargs -0 to help with strange filenames.)

Check out the git grep man page for other useful options - --no-index or --cached could help depending on exactly what set of files you want to operate on.

Related Question