Grep Context – How to Limit Grep Context to N Characters on Line

grepjsonsearch

I have to grep through some JSON files in which the line lengths exceed a few thousand characters. How can I limit grep to display context up to N characters to the left and right of the match? Any tool other than grep would be fine as well, so long as it available in common Linux packages.

This would be example output, for the imaginary grep switch Ф:

$ grep -r foo *
hello.txt: Once upon a time a big foo came out of the woods.

$ grep -Ф 10 -r foo *
hello.txt: ime a big foo came of t

Best Answer

With GNU grep:

N=10; grep -roP ".{0,$N}foo.{0,$N}" .

Explanation:

-o => Print only what you matched
-P => Use Perl-style regular expressions
The regex says match 0 to $N characters followed by foo followed by 0 to $N characters.

If you don't have GNU grep:

find . -type f -exec \
    perl -nle '
        BEGIN{$N=10}
        print if s/^.*?(.{0,$N}foo.{0,$N}).*?$/$ARGV:$1/
    ' {} \;

Explanation:

Since we can no longer rely on grep being GNU grep, we make use of find to search for files recursively (the -r action of GNU grep). For each file found, we execute the Perl snippet.

Perl switches:

-n Read the file line by line
-l Remove the newline at the end of each line and put it back when printing
-e Treat the following string as code

The Perl snippet is doing essentially the same thing as grep. It starts by setting a variable $N to the number of context characters you want. The BEGIN{} means this is executed only once at the start of execution not once for every line in every file.

The statement executed for each line is to print the line if the regex substitution works.

The regex:

Match any old thing lazily¹ at the start of line (^.*?) followed by .{0,$N} as in the grep case, followed by foofollowed by another .{0,$N} and finally match any old thing lazily till the end of line (.*?$).
We substitute this with $ARGV:$1. $ARGV is a magical variable that holds the name of the current file being read. $1 is what the parens matched: the context in this case.
The lazy matches at either end are required because a greedy match would eat all characters before foo without failing to match (since .{0,$N} is allowed to match zero times).

¹_{That is, prefer not to match anything unless this would cause the overall match to fail. In short, match as few characters as possible.}

Related Solutions

Grep Command – Display Filename Once with Context and Line Numbers

I would change a few things about.

find_code() { 
    # assign all arguments (not just the first ${1}) to MATCH
    # so find_code can be used with multiple arguments:
    #    find_code errorCode
    #    find_code = 1111
    #    find_code errorCode = 1111
    MATCH="$@" 

    # For each file that has a match in it (note I use `-l` to get just the file name
    # that matches, and not the display of the matching part) I.e we get an output of:
    #
    #       srcdir/matching_file.c
    # NOT:
    #       srcdir/matching_file.c:       errorCode = 1111
    #
    grep -lr "$MATCH" ${SRCDIR} | while read file 
    do 
        # echo the filename
        echo ${file}
        # and grep the match in that file (this time using `-h` to suppress the 
        # display of the filename that actually matched, and `-n` to display the 
        # line numbers)
        grep -nh -A5 -B5 "$MATCH" "${file}"
    done 
}

Using empty line as context “group-separator” for grep

If you use the GREP_COLORS environment variable you can control specific colors for each type of match. man grep explains the use of the variable.

The following command will print a colored match, but nothing on the line that separates the group, just a blank line. Piped through od you'll see the color escape before and after the match, but only \n\n at the group separation.

GREP_COLORS='ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=' grep --group-separator="" --color=always -A5

Unsetting the se component will suppress the printing of color in the group separator.

Since my example above used all of the default values for GREP_COLORS the following will work as well.

GREP_COLORS='se=' grep --group-separator="" --color=always -A5

If you're not using a bashlike shell, you might need to export GREP_COLORS first.

Best Answer

Related Solutions

Grep Command – Display Filename Once with Context and Line Numbers

Using empty line as context “group-separator” for grep

Related Question