Retrieving lines from a file depending on other lines

awkgrepsedtext processing

Imagine the following file structure:

foo.bar.1
blabla
moreblabla
relevant=yes
foo.bar.2
relevant=no
foo.bar.3
blablabla
foo.bar.4
relevant=yes

I want to retrieve all foo.bar lines where within the block following themselves and before the next foo.bar there is a line stating relevant=yes.

So the output should be:

foo.bar.1
foo.bar.4

I could of course write a program/script iterating through the lines, remembering the foo.bars and print them when there is a line saying relevant=yes following them an before the next foo.bar. But I thought there might be an out-of-the box way using standard Unix utilities (grep/sed/awk)?

Thanx for any hints!

Best Answer

If the input is processed line by line, then processing needs to go like this:

if the current line is foo.bar, store it, forgetting any previous foo.bar line that wasn't enabled for output;
if the current line is relevant=yes, this enables the latest foo.bar for output.

This kind of reasoning is a job for awk. (It can also be done in sed if you like pain.)

awk '
    /^foo\.bar/ { foobar = $0 }
    /^relevant=yes$/ {if (foobar != "") {print foobar; foobar = ""}}
'

Related Solutions

Delete whitespace for a set of lines in Vim editor

:%s/^\s\+
" Same thing (:le = :left = left-align given range):
:%le

Learn more here at http://vim.wikia.com/wiki/Remove_unwanted_spaces

If you want to do this for a particular range of lines:

:19,25s/^\s\+//

BTW, best way to start learning vim is to execute vimtutor command, it will teach you how to use Vim in Vim editor.

Bash – Remove nearly duplicate lines

How about joining adjacent pairs of lines, and then using a backreference to find the non-unique prefix?

$ sed '$!N; /\(.*\)\n\1:FOO/D; P;D' file
red.7
green.2:FOO
blue.6
yellow.9:FOO

Explanation:

$!N - if we are not already at the last line, append the next line to the pattern space, separated by a newline
/$.*$\n - match everything up to the newline (i.e. the first of each pair of lines) and save it into a capture group
\1:FOO now matches whatever was captured from the first line, followed by :FOO (\1 is a backreference to the first capture group)
/$.*$\n\1:FOO/D - if the second line of each pair is the same as the first followed by :FOO, then Delete the first
Print and Delete the remaining line ready to start the next cycle

or neater (thanks @don_crissti)

 sed '$!N; /$.*$\n\1:FOO/!P;D' file
N means there are always two consecutive lines in the pattern space and sed Prints the first one of them only if the second line isn't the same as the first one plus the suffix :FOO. Then D removes the first line from the pattern space and restarts the cycle.

Best Answer

Related Solutions

Delete whitespace for a set of lines in Vim editor

Bash – Remove nearly duplicate lines

Related Question