Awk or sed command to match regex at specific line, exit true if success, false otherwise

awkgnu-screenpuppetsedtext processing

I need to determine if a file contains a certain regex at a certain line and to return true (exit 0) if found, and otherwise false. Maybe I'm overthinking this, but my attempts proved a tad unwieldy. I have a solution, but I'm looking for maybe others that I hadn't thought of. I could use perl, but I'm hoping to keep this "lightweight" as possible as it runs during a puppet execution cycle.

The problem is common enough: in RHEL6, screen was packaged in a way that limited the terminal width to 80 characters, unless you un-comment the line at 132. This command checks to see if that line has already been fixed:

 awk 'NR==132 && /^#termcapinfo[[:space:]]*xterm Z0=/ {x=1;nextfile} END {exit 1-x}' /etc/screenrc

Note: if the file has fewer that 132 lines, it must exit with false.

I thought sed would be of help here, but apparently then you have to do weird tricks like null-substitutions and branches. Still, I'd like to see a sed solution just to learn. And maybe there is something else I overlooked.

EDIT 1: Added nextfile to my awk solution

EDIT 2: Benchmarks EDIT 3: Different host (idle). EDIT 4: mistakenly used Gile's awk time for optimized-per's run. EDIT 5: new bench

Benchmarks

First, note: wc -l /etc/screenrc is 216.
50k iterations when line not present, measured in wall-time:

Null-op: 0.545s
My original awk solution: 58.417
My edited awk solution (with nextfile): 58.364s
Giles' awk solution: 57.578s
Optimized perl solution 90.352s Doh!
Sed 132{p;q}|grep -q ... solution: 61.259s
Cuonglm's tail | head | grep -q : 70.418s Ouch!
Don_chrissti's head -nX |head -n1|grep -q: 116.9s Brrrrp!
Terdon's double-grep solution: 65.127s
John1024's sed solution: 45.764s

Thank you John and thank you sed! I am honestly surprised perl was on-par here. Perl loads in a bunch of shared libraries on startup, but as long as the OS is caching them all, it comes down to the parser and byte-coder. In the distant past (perl 5.2?) I found it was slower by 20%. Perl was slower as I originally expected but appeared to be better due to a copy/paste error on my part.

Benchmarks Part 2

The biggest configuration file which has practical value is /etc/services. So I've re-run these benches for this file and where the line to be changed is 2/3rds in the file. Total lines is 1100, so I picked 7220 and modified the regex accordingly (so that in one case it fails, in another it succeeds; for the bench it always fails).

John's sed solution: 121.4s
Chrissti's {head;head}|grep solution: 138.341s
Counglm's tail|head|grep solution: 77.948s
My awk solution: 175.5s

Best Answer

With GNU sed:

sed -n '132 {/^#termcapinfo[[:space:]]*xterm Z0=/q}; $q1'

How it works

132 {/^#termcapinfo[[:space:]]*xterm Z0=/q}

On line 132, check for the regex ^#termcapinfo[[:space:]]*xterm Z0=. If found quit, q, with the default exit code of 0. The rest of the file is skipped.
$q1

If we reach the last line, $, then quit with exit code 1: q1.

Efficiency

Since it is not necessary to read past the 132nd line of the file, this version quits as soon as we reach the 132nd line or the end of the file, whichever occurs first:

sed -n '132 {/^#termcapinfo[[:space:]]*xterm Z0=/q; q1}; $q1'

Handling empty files

The version above will return true for empty files. This is because, if the file empty, no commands are executed and the sed exits with the default exit code of 0. To avoid this:

! sed -n '132 {/^#termcapinfo[[:space:]]*xterm Z0=/q1; q}'

Here, the sed command exits with code 0 unless the the desired string is found in which case it exits with code 1 The preceding ! tells the shell to invert this code to get back to the code we want. The ! modifier is supported by all POSIX shells. This version will work even for empty files. (Hat tip: G-Man)

Using awk

Try:

$ awk '/Start to grab/,/^$/' prova.txt
Start to grab from here: 1
random1
random2
random3
random4

Start to grab from here: 2
random1546
random2561

Start to grab from here: 3
random45
random22131

/Start to grab/,/^$/ defines a range. It starts with any line that matches Start to grab and ends with the first empty line, ^$, that follows.

Using sed

With very similar logic:

$ sed -n '/Start to grab/,/^$/p' prova.txt
Start to grab from here: 1
random1
random2
random3
random4

Start to grab from here: 2
random1546
random2561

Start to grab from here: 3
random45
random22131

-n tells sed not to print anything unless we explicitly ask it to. /Start to grab/,/^$/p tells it to print any lines in the range defined by /Start to grab/,/^$/.

Benchmarks

Benchmarks Part 2

Best Answer

How it works

Efficiency

Handling empty files

Related Solutions

Bash – Text file look-up by column

Grep Sed Awk – Extract Text Until First Blank Line

Using awk

Using sed

Related Question