How to edit the entire file after match a grep pattern

awkgrepsedtext processing

To simplify, I want to edit the whole file after matched pattern, example of a file:

$ cat file
ip=x.x.x.a
mask=255.0.0.0
host=a
ip=x.x.x.b
mask=255.0.0.0
host=b
ip=x.x.x.c
mask=255.0.0.0
host=c
ip=x.x.x.x
blahblah
mask=255.0.0.0
host=d

Let's suppose that I want to edit the IP from host c, but note that this may be variable, so I don't know this value. If I grep host c and use the -B2 to print the lines I can't edit it in the original file! Another point the line may not have the same structure, that's the case of host d, between ip and mask info there's some text, so I can't assume that the IP pattern always will be 2 lines before my search pattern.

Resuming, I can't grep the IP directly because I don't know it, instead I need to search for host, and edit the line before this match to change the value of IP. How can I do this?

Best Answer

This will change the IP associated with host c to 1.2.3.4:

$ sed 's/^ip/\nip/' file | perl -00pe 'if(/\nhost=c\n/){s/ip=\S+/ip=1.2.3.4/} s/\n\n/\n/' 
ip=x.x.x.a
mask=255.0.0.0
host=a
ip=x.x.x.b
mask=255.0.0.0
host=b
ip=1.2.3.4
mask=255.0.0.0
host=c
ip=x.x.x.x
blahblah
mask=255.0.0.0
host=d

Explanation:

sed 's/^ip/\nip/' file : add an extra newline (\n) to each line beginning with ip. I think this might not work with all implementations of sed, so if yours doesn't support this, replace the sed command with perl -pe 's/^ip/\nip/'. We need this in order to use Perl's "paragraph mode" (seen below).
perl -00pe : the -00 makes perl run in "paragraph mode" where a "line" is defined by two consecutive newlines. This enables us to treat each host's block as a single "line". The -pe means "print each line after applying the script given by -e to it".
if(/\nhost=c\n/){s/ip=\S+/ip=1.2.3.4/} : if this "line" (section) matches a newline followed by the string host=c and then another newline, then replace ip= and 1 or more non-whitespace characters (\S+) following it with ip=1.2.3.4.
s/\n\n/\n/ replace each pair of newlines with a single newline to get the original file's format back.

If you want this to change the file in place, you can use:

tmp=$(mktemp); sed 's/^ip/\nip/' file > $tmp; 
perl -00pe 'if(/\nhost=c\n/){s/ip=\S+/ip=1.2.3.4/} s/\n\n/\n/' $tmp > file

Related Solutions

Grep – Inverse Match and Exclude Lines Before and After

don's might be better in most cases, but just in case the file is really big, and you can't get sed to handle a script file that large (which can happen at around 5000+ lines of script), here it is with plain sed:

sed -ne:t -e"/\n.*$match/D" \
    -e'$!N;//D;/'"$match/{" \
            -e"s/\n/&/$A;t" \
            -e'$q;bt' -e\}  \
    -e's/\n/&/'"$B;tP"      \
    -e'$!bt' -e:P  -e'P;D'

This is an example of what is called a sliding window on input. It works by building a look-ahead buffer of $B-count lines before ever attempting to print anything.

And actually, probably I should clarify my previous point: the primary performance limiter for both this solution and don's will be directly related to interval. This solution will slow with larger interval sizes, whereas don's will slow with larger interval frequencies. In other words, even if the input file is very large, if the actual interval occurrence is still very infrequent then his solution is probably the way to go. However, if the interval size is relatively manageable, and is likely to occur often, then this is the solution you should choose.

So here's the workflow:

If $match is found in pattern space preceded by a \newline, sed will recursively Delete every \newline that precedes it.
- I was clearing $match's pattern space out completely before - but to easily handle overlap, leaving a landmark seems to work far better.
- I also tried s/.*\n.*$$match$/\1/ to try to get it in one go and dodge the loop, but when $A/$B are large, the Delete loop proves considerably faster.
Then we pull in the Next line of input preceded by a \newline delimiter and try once again to Delete a /\n.*$match/ once again by referring to our most recently used regular expression w/ //.
If pattern space matches $match then it can only do so with $match at the head of the line - all $Before lines have been cleared.
- So we start looping over $After.
- Each run of this loop we'll attempt to s///ubstitute for &itself the $Ath \newline character in pattern space, and, if successful, test will branch us - and our whole $After buffer - out of the script entirely to start the script over from the top with the next input line if any.
- If the test is not successful we'll branch back to the :top label and recurse for another line of input - possibly starting the loop over if $match occurs while gathering $After.
If we get past a $match function loop, then we'll try to print the $last line if this is it, and if !not try to s///ubstitute for &itself the $Bth \newline character in pattern space.
- We'll test this, too, and if it is successful we'll branch to the :Print label.
- If not we'll branch back to :top and get another input line appended to the buffer.
If we make it to :Print we'll Print then Delete up to the first \newline in pattern space and rerun the script from the top with what remains.

And so this time, if we were doing A=2 B=2 match=5; seq 5 | sed...

The pattern space for the first iteration at :Print would look like:

^1\n2\n3$

And that's how sed gathers its $Before buffer. And so sed prints to output $B-count lines behind the input it has gathered. This means that, given our previous example, sed would Print 1 to output, and then Delete that and send back to the top of the script a pattern space which looks like:

^2\n3$

...and at the top of the script the Next input line is retrieved and so the next iteration looks like:

^2\n3\n4$

And so when we find the first occurrence of 5 in input, the pattern space actually looks like:

^3\n4\n5$

Then the Delete loop kicks in and when it's through it looks like:

^5$

And when the Next input line is pulled sed hits EOF and quits. By that time it has only ever Printed lines 1 and 2.

Here's an example run:

A=8 B=7 match='[24689]0'
seq 100 |
sed -ne:t -e"/\n.*$match/D" \
    -e'$!N;//D;/'"$match/{" \
            -e"s/\n/&/$A;t" \
            -e'$q;bt' -e\}  \
    -e's/\n/&/'"$B;tP"      \
    -e'$!bt' -e:P  -e'P;D'

That prints:

Text Processing – Robust Way to Edit and Replace Pattern Matched

sed here is the perfect tool for the task. However note that you almost never need to pipe several sed invocations together as a sed script can be made of several commands.

If you wanted to extract the first sequence of 2 decimal digits and append following a space to end of the line if found, you'd do:

sed 's/\([[:digit:]]\{2\}\).*$/& \1/' < your-file

If you wanted to do that only if it's found in second position on the line and following a a:

sed 's/^a\([[:digit:]]\{2\}\).*$/& \1/' < your-file

And if you don't want to do it if that sequence of 2 digits is followed by more digits:

sed 's/^a\([[:digit:]]\{2\}\)\([^[:digit:]].*\)\{0,1\}$/& \1/' < your-file

In terms of robustness it all boils down to answering the question: what should be matched? and what should not be?. That's why it's important to specify your requirements clearly, and also understand what the input may look like (like can there be digits in the lines where you don't want to find a match?, can there be non-ASCII characters in the input?, is the input encoded in the locale's charset? etc.).

Above, depending on the sed implementation, the input will be decoded into text based on the locale's charmap (see output of locale charmap), or interpreted as if each byte corresponded to a character and bytes 0 to 127 interpreted as per the ASCII charmap (assuming you're not on a EBCDIC based system).

For sed implementations in the first category, it may not work properly if the file is not encoded in the right charset. For those in the second category, it could fail if there are characters in the input whose encoding contains the encoding of decimal digits.

Best Answer

Explanation:

Related Solutions

Grep – Inverse Match and Exclude Lines Before and After

Text Processing – Robust Way to Edit and Replace Pattern Matched

Related Question