Grep -v: How to exclude only the first (or last) N lines that match

greptext processing

Sometimes there are a few really annoying lines in otherwise tabular data like

column name | other column name
-------------------------------

I generally prefer removing garbage lines that shouldn't be there by grep -v ing a reasonably unique string, but the problem with that approach is that if the reasonably unique string appears in the data by accident that's a serious problem.

Is there a way to limit the number of lines that grep -v can remove (say to 1)? For bonus points, is there a way to count the number of lines from the end without resorting to <some command> | tac | grep -v <some stuff> | tac ?

Best Answer

sed provides a simpler way:

... |  sed '/some stuff/ {N; s/^.*\n//; :p; N; $q; bp}' | ...

This way you delete first occurrence.

If you want more:

sed '1 {h; s/.*/iiii/; x}; /some stuff/ {x; s/^i//; x; td; b; :d; d}'

, where count of i is count of occurrences (one or more, not zero).

Multi-line Explanation

sed '1 {
    # Save first line in hold buffer, put `i`s to main buffer, swap buffers
    h
    s/^.*$/iiii/
    x
}

# For regexp what we finding
/some stuff/ {
    # Remove one `i` from hold buffer
    x
    s/i//
    x
    # If successful, there was `i`. Jump to `:d`, delete line
    td
    # If not, process next line (print others).
    b
    :d
    d
}'

In addition

Probably, this variant will work faster, 'cos it reads all rest lines and print them in one time

sed '1 {h; s/.*/ii/; x}; /a/ {x; s/i//; x; td; :print_all; N; $q; bprint_all; :d; d}'

As result

You can put this code into your .bashrc (or config of your shell, if it is other):

dtrash() {
    if [ $# -eq 0 ]
    then
        cat
    elif [ $# -eq 1 ]
    then
        sed "/$1/ {N; s/^.*\n//; :p; N; \$q; bp}"
    else
        count=""
        for i in $(seq $1)
        do
            count="${count}i"
        done
        sed "1 {h; s/.*/$count/; x}; /$2/ {x; s/i//; x; td; :print_all; N; \$q; bprint_all; :d; d}"

    fi
}

And use it this way:

# Remove first occurrence
cat file | dtrash 'stuff' 
# Remove four occurrences
cat file | dtrash 4 'stuff'
# Don't modify
cat file | dtrash
Related Question