Is it possible to use the find
command to find all the "non-binary" files in a directory? Here's the problem I'm trying to solve.
I've received an archive of files from a windows user. This archive contains source code and image files. Our build system doesn't play nice with files that have windows line endings. I have a command line program (flip -u
) that will flip line endings between *nix and windows. So, I'd like to do something like this
find . -type f | xargs flip -u
However, if this command is run against an image file, or other binary media file, it will corrupt the file. I realize I could build a list of file extensions and filter with that, but I'd rather have something that's not reliant on me keeping that list up to date.
So, is there a way to find all the non-binary files in a directory tree? Or is there an alternate solution I should consider?
Best Answer
I'd use
file
and pipe the output into grep or awk to find text files, then extract just the filename portion offile
's output and pipe that into xargs.something like:
Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.
You can also use
find
(or whatever) to generate a list of files to examine withfile
:The
-d'\n'
argument to xargs makes xargs treat each input line as a separate argument, thus catering for filenames with spaces and other problematic characters. i.e. it's an alternative toxargs -0
when the input source doesn't or can't generate NULL-separated output (such asfind
's-print0
option). According to the changelog, xargs got the-d
/--delimiter
option in Sep 2005 so should be in any non-ancient linux distro (I wasn't sure, which is why I checked - I just vaguely remembered it was a "recent" addition).Note that a linefeed is a valid character in filenames, so this will break if any filenames have linefeeds in them. For typical unix users, this is pathologically insane, but isn't unheard of if the files originated on Mac or Windows machines.
Also note that
file
is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.I have used numerous variations of this method many times in the past with success.