I currently use the following simplified command to remove trailing whitespace and add a newline at end of file where needed:
find . -type f -exec sed -i -e 's/[ \t]\+\(\r\?\)$/\1/;$a\' {} \+
As you'll quickly see, this has two problems: It will change binary files and it will add a newline to the end of files with ␍␊ line separators. These modifications are easy to undo or skip when committing in git gui
or the like, but I'd like to minimize* the amount of reverting. To that end:
Is there a way to skip the whole file if any line matches a regex in sed
?
* I'm aware that there might be binary files without ␀ characters, and there could be files with deliberately mixed newlines or ␀s. But I'm looking for the solution which requires the minimal human intervention. I could conceivably list all the file extensions that I'd like to operate on, but it would be a very long list which would have to be constantly reviewed, and because of name clashes it would still be possible that binary files slip through.
Complicated workaround:
while IFS= read -r -d '' -u 9
do
if [[ "$(file -bs --mime-type -- "$REPLY")" = text/* ]]
then
sed -i -e 's/[ \t]\+\(\r\?\)$/\1/;$a\' -- "$REPLY"
else
echo "Skipping $REPLY" >&2
fi
done 9< <(find . -type f -print0)
Best Answer
If you trust
git
's point of view on what is a binary file or not, you can usegit grep
to get a list of non-binary files. Assumingt.cpp
is a text file, andls
is a binary, both checked in:The
-I
option means:To combine that with your
sed
expression:(
-z
/xargs -0
to help with strange filenames.)Check out the
git grep
man page for other useful options ---no-index
or--cached
could help depending on exactly what set of files you want to operate on.