With the newer versions of GNU grep
(that has the -z
option) you can use this one liner:
find . -type f -exec grep -lz 'this[[:space:]]*is[[:space:]]*some[[:space:]]*text' {} +
Considering the whitespaces can come in between the words only.
If you just want to search all files recursively starting from current directory, you don't need find
, you can just use grep -r
(recursive). find
can be used to be selective on the files to search e.g. choose files of which directory to exclude. So, simply:
grep -rlz 'this[[:space:]]*is[[:space:]]*some[[:space:]]*text' .
The main trick here is -z
, it will treat the each line of input stream ended in ASCII NUL instead of new line, as a result we can match newlines by using usual methods.
[[:space:]]
character class pattern indicates any whitespace characters including space, tab, CR, LF etc. So, we can use it to match all the whitespace characters that can come in between the words.
grep -l
will print only the file names that having any of the desired patterns. If you want to print the matches also, use -H
instead of -l
.
On the other hand, if the whitespaces can come at any places rather than the words, this would loose its good look:
grep -rlz
't[[:space:]]*h[[:space:]]*i[[:space:]]*s[[:space:]]*i[[:space:]]*\
s[[:space:]]*s[[:space:]]*o[[:space:]]*m[[:space:]]*e[[:space:]]*\
t[[:space:]]*e[[:space:]]*x[[:space:]]*t' .
With -P
(PCRE) option you can replace the [[:space:]]
with \s
(this would look much nicer):
grep -rlzP 't\s*h\s*i\s*s\s*i\s*s\s*s\s*o\s*m\s*e\s*\
t\s*e\s*x\s*t' .
Using @steeldriver's suggestion to get sed
to generate the pattern for us would be the best option:
grep -rlzP "$(sed 's/./\\s*&/2g' <<< "thisissometext")" .
Best Answer
The script below searches (text)files in a given directory recursively, for occurrences of a given string, no matter if it is in upper or lowercase, or any combination of those.
It will give you a list of found matches, the paths to the files, combined with the filenam and the actual occurrences of the string in the file, looking like:
etc.
To limit the search time, I would look for matches in specific directories, so not for 2TB of files ;).
To use it:
1] Copy the text below, paste it into an empty textfile (gedit). 2] Edit the two lines in the headsection to define the string to look for and the directory to search. 3] Save it as searchfor.py. 4] To run it: open a terminal, type
python3
+space
, then drag the script on to the terminalwindow and press return. The list of found matches will appear in the terminalwindowIn case of an error, the script will mention it.