Find human-readable files

filesfind

I am trying to find an efficient way to do the level 5 of the OverTheWire bandit challenge.

Anyway, I have a bunch of files, and there is only one that respects the following criteria:

  • Human-readable
  • 1033 bytes in size
  • Non-executable

Right now, I am using the find command, and I am able to find the files matching the 2 last criteria:

find . -size 1033c ! -executable

However, I don't know how to excluse non-human-readable files. Solutions I found for that challenge use the -readable test parameter, but I don't think this works. -readable only looks at the files' permissions, and not at its content, while the challenge description ask for an ASCII file or something like that.

Best Answer

Yes, you can use find to look for non-executable files of the right size and then use file to check for ASCII. Something like:

find . -type f -size 1033c ! -executable -exec file {} + | grep ASCII

The question, however, isn't as simple as it sounds. 'Human readable' is a horribly vague term. Presumably, you mean text. OK, but what kind of text? Latin character ASCII only? Full Unicode? For example, consider these three files:

$ cat file1
abcde
$ cat file2
αβγδε
$ cat file3
abcde
αβγδε
$ cat file4
#!/bin/sh
echo foo

These are all text and human readable. Now, let's see what file makes of them:

$ file *
file1: ASCII text
file2: UTF-8 Unicode text
file3: UTF-8 Unicode text
file4: POSIX shell script, ASCII text executable

So, the find command above will only find file1 (for the sake of this example, let's imagine those files had 1033 characters). You could expand the find to look for the string text:

find . -type f -size 1033c ! -executable -exec file {} + | grep -w text

With the -w, grep will only print lines where text is found as a stand-alone word. That should be pretty close to what you want, but I can't guarantee that there is no other file type whose description might also include the string text.

Related Question