Using grep
Why can't you just use the -r
switch to grep
to recurse the filesystem instead of making use of find
? There are 2 additional switches I'd use too, instead of the -n
switch.
$ grep -rHn PATTERN <DIR> | cut -d":" -f1-2
Example #1
$ grep -rHn PATH ~/.bashrc | cut -d":" -f1-2
/home/saml/.bashrc:25
Details
-r
- recursively search through files + directories
-H
- prints the name of the file if it matches (less restrictive than -l
) i.e. it works with grep
's other switches
-n
- display the line number of the match
Example #2
$ grep -rHn PATH ~/.bash* | cut -d":" -f1-2
/home/saml/.bash_profile:10
/home/saml/.bash_profile:12
/home/saml/.bash_profile_askapache:99
/home/saml/.bash_profile_askapache:101
/home/saml/.bash_profile_askapache:118
/home/saml/.bash_profile_askapache:166
/home/saml/.bash_profile_askapache:218
/home/saml/.bash_profile_askapache:250
/home/saml/.bash_profile_askapache:314
/home/saml/.bash_profile_askapache:2317
/home/saml/.bash_profile_askapache:2323
/home/saml/.bashrc:25
Using find
$ find . -exec sh -c 'grep -Hn PATTERN "$@" | cut -d":" -f1-2' {} +
Example
$ find ~/.bash* -exec sh -c 'grep -Hn PATH "$@" | cut -d":" -f1-2' {} +
/home/saml/.bash_profile:10
/home/saml/.bash_profile:12
/home/saml/.bash_profile_askapache:99
/home/saml/.bash_profile_askapache:101
/home/saml/.bash_profile_askapache:118
/home/saml/.bash_profile_askapache:166
/home/saml/.bash_profile_askapache:218
/home/saml/.bash_profile_askapache:250
/home/saml/.bash_profile_askapache:314
/home/saml/.bash_profile_askapache:2317
/home/saml/.bash_profile_askapache:2323
/home/saml/.bashrc:25
If you truly want to use find
you can do something like this to exec grep
upon finding the files using find
.
To look for \\\"
anywhere on a line:
grep -F '\\\"'
That is, use -F
for a fixed string search as opposed to a regular expression match (where backslash is special). And use strong quotes ('...'
) inside which backslash is not special.
Without -F
, you'd need to double the backslashes:
grep '\\\\\\"'
Or use:
grep '\\\{3\}"'
grep -E '\\{3}"'
grep -E '[\]{3}"'
Within double quotes, you'd need another level of backslashes and also escape the "
with backslash:
# 1
# 1234567890123
grep "\\\\\\\\\\\\\""
backslash is another shell quoting operator. So you can also quote those backslash and "
characters with backslash:
\g\r\e\p \\\\\\\\\\\\\"
I've even quoted the characters of grep
above though that's not necessary (as none of g
, r
, e
, p
are special to the shell (except in the Bourne shell if they appear in $IFS
). The only character I've not quoted is the space character, as we do need its special meaning in the shell: separate arguments.
To look for \\\"
provided it's not preceded by another backslash
grep -e '^\\\\\\"' -e '[^\]\\\\\\"'
That is, look for \\\"
at the beginning of the line, or following a character other than backslash.
That time, we have to use a regular expression, a fixed-string search won't do.
grep
returns the lines that match any of those expressions. You can also write it with one expression per line:
grep '^\\\\\\"
[^\]\\\\\\"'
Or with only one expression:
grep '^\(.*[^\]\)\{0,1\}\\\{3\}"' # BRE
grep -E '^(.*[^\])?\\{3}"' # ERE equivalent
grep -E '(^|[^\])\\{3}"'
With GNU grep
built with PCRE support, you can use a look-behind negative assertion:
grep -P '(?<!\\)\\{3}"'
Get a match count
To get a count of the lines that match the pattern (that is, that have one or more occurrences of \\\"
), you'd add the -c
option to grep
. If however you want the number of occurrences, you can use the GNU specific -o
option (though now also supported by a few other implementations) to print all the matches one per line, and then pipe to wc -l
to get a line-count:
grep -Po '(?<!\\)\\{3}"' | wc -l
Or standardly/POSIXly, use awk
instead:
awk '{n+=gsub(/(^|[^\\])\\{3}"/,"")};END{print 0+n}'
(awk
's gsub()
substitutes and returns the number of substitutions).
Best Answer
This is the one-liner solution requested (for recent shells that have "process substitution"):
If no "process substitution"
<(…)
is available, just use grep as a filter:Below is the detailed description of each part of the solution.
Byte values from hex numbers:
Your first problem is easy to resolve:
Change the upper
X
to a lower onex
and use printf (for most shells):Or use:
For those shells that choose to not implement the '\x' representation.
Of course, translating hex to octal will work on (almost) any shell:
Where "$sh" is any (reasonable) shell. But it is quite difficult to keep it correctly quoted.
Binary files.
The most robust solution is to transform the file and the byte sequence (both) to some encoding that has no issues with odd character values like (new line)
0x0A
or (null byte)0x00
. Both are quite difficult to manage correctly with tools designed and adapted to process "text files".A transformation like base64 may seem a valid one, but it presents the issue that every input byte may have up to three output representations depending if it is the first, second or third byte of the mod 24 (bits) position.
Hex transform.
Thats why the most robust transformation should be one that starts on each byte boundary, like the simple HEX representation.
We can get a file with the hex representation of the file with either any of this tools:
The byte sequence to search is already in hex in this case.
:
But it could also be transformed. An example of a round trip hex-bin-hex follows:
The search string may be set from the binary representation. Any of the three options presented above od, hexdump, or xxd are equivalent. Just make sure to include the spaces to ensure the match is on byte boundaries (no nibble shift allowed):
If the binary file looks like this:
Then, a simple grep search will give the list of matched sequences:
One Line?
It all may be performed in one line:
For example, searching for
11221122
in the same file will need this two steps:To "see" the matches:
… 0a 3131323231313232313132323131323231313232313132323131323231313232 313132320a
Buffering
There is a concern that grep will buffer the whole file, and, if the file is big, create a heavy load for the computer. For that, we may use an unbuffered sed solution:
The first sed is unbuffered (
-u
) and is used only to inject two newlines on the stream per matching string. The secondsed
will only print the (short) matching lines. The wc -l will count the matching lines.This will buffer only some short lines. The matching string(s) in the second sed. This should be quite low in resources used.
Or, somewhat more complex to understand, but the same idea in one sed: