I am attempting to write a filter using something like sed
or awk
to do the following:
- If a given pattern does not exist in the input, copy the entire input to the output
- If the pattern exists in the input, copy only the lines after the first occurrence to the output
This happens to be for a "git clean" filter, but that's probably not important. The important aspect is this needs to be implemented as a filter, because the input is provided on stdin.
I know how to use sed
to delete lines up to a pattern, eg. 1,/pattern/d
but that deletes the entire input if /pattern/
is not matched anywhere.
I can imagine writing a whole shell script that creates a temporary file, does a grep -q
or something, and then decides how to process the input. I'd prefer to do this without messing around creating a temporary file, if possible. This needs to be efficient because git might call it frequently.
Best Answer
If your files are not too large to fit in memory, you could use perl to slurp the file:
Just change
PAT
to whatever pattern you're after. For example, given these two input files and the pattern5
:Explanation
-pe
: read the input file line by line, apply the script given by-e
to each line and print.-0777
: slurp the entire file into memory.s/.*?PAT[^\n]*\n?//s
: remove everything until the 1st occurrence ofPAT
and until the end of the line.For larger files, I don't see any way to avoid reading the file twice. Something like:
Explanation
awk -vpat=5
: runawk
and set the variablepat
to5
.if(NR==FNR){}
: if this is the 1st file.if($0~pat && !a){a++; next}
: if this line matches the value ofpat
anda
is not defined, incrementa
by one and skip to the next line.if(a){print}
: ifa
is defined (if this file matches the pattern), print the line.else{ }
: if this is not the 1st file (so it's the second pass).if(!a){print}
ifa
is not defined, we want the whole file, so print every line.else{exit}
: ifa
is defined, we've already printed in the 1st pass so there's no need to reprocess the file.