Bash – Modify Non-Adjacent Lines Specified by Line Numbers

bashsed

I know the line numbers in advance, and keep them in another file:

cat linenos
2
15
42
44
... etc

as you see the lines are non-adjacent so I cannt use one range for sed.
The goal is to modify target file lines by, say, prepending them with a marker like MARKER

Straight forward way is to call sed multiple times to modify each line:

for l in $(cat linenos)
do 
  sed -i "${l}s/^/MARKER/" target_file
done

which apparently will call sed multiple times.

CAUTION: *Not only is this approach inefficient it can also make things go wrong if the modification is other than inserting a marker like this. Any line deletion or insertion sed command like d a r, will make initial line number in linenos invalid for the next sed runs in the loop.

What would you suggest to improve/optimize that?

Sample linenos file

cat linenos
2
5

Sample target_file

cat target_file
line one
line two
line three
line four
line five
line six

Expected result of modified target_file

cat target_file
line one
MARKERline two
line three
line four
MARKERline five
line six

Possible approach i came up with is dynamically create sed scenario

SEDCMD=$(for l in $(cat linenos); do echo -n "${l}s/^/MARK/;" ; done)

sed -i -e "$SEDCMD" targetfile

@steeldriver's below approach shares the idea, but is more elegant and concise

Best Answer

If fileN contains the numbers of lines to be modified, and target_file is the text file that contain the lines to be modified. The minimum solution will require to read each file once.

Sorted

If the file that contains the line numbers contains one number (bigger than 1) per line, is sorted and there are no repetitions, we can use:

awk 'BEGIN{ getline lineN <"fileN"} {
     if(NR==lineN){$0="MARKER " $0;getline lineN <"fileN"}
     }1' target_file

Which will keep only one line in memory (of each file) and walk both files from start to end. However once awk has processed a line, line 15 for example, it won't go back to line 12, for example. So, the file lineN has to be sorted (not repeated, and greater than 1) for this to work.

Unsorted

Of course, the naive solution is that the line numbers file could be sorted sort -nu fileN.

But, if the list of line numbers may be unsorted (and repeated), we may use either sed , ed (the precursor of sed), or awk (later):

Convert each line in lineN to a sed editing command like s/^/MARKER /. Either shell printf or sed could do that:

printf '%ss/^/MARKER /\n' $(<fileN) | sed -f - target_file
sed 's#$#s/^/MARKER /#' fileN       | sed -f - target_file

{ printf '%ss/^/MARKER /\n' $(<fileN); printf '%s\n' ,p Q; } | ed -Gs target_file
{ sed 's#$#s/^/MARKER /#' fileN ; echo "w"      ; } | ed target_file

Note that in the last case the editing is done directly and at the original file. The last command w writes the modifications to file. If what is needed is to print the result then use the third option, which will print all lines.

awk

In awk, capture the whole fileN in memory and process target_file

awk '{ if(NR==FNR){
                     a[$1]=1
                  }else{
                     if(a[FNR]==1){ printf("%s","MARKER ")};
                     print 
                  }
     }' fileN target_file

Or, with a variable to control when the list of files with line numbers has ended:

awk '{ if (dofile==1) {   if(a[FNR]==1){ printf("%s","MARKER ")};
                          print
                      }else{
                          a[$1]=1
                      }
     }' fileN fileK   dofile=1   target_file

Note that the last version allows several files with line numbers, like fileN and fileK in the example.

Also note that the awk versions do not process repeated line numbers. All repeated line numbers are processed just once.

Related Solutions

Pattern Search between specific lines and print line numbers

When working w/ sed I typically find it easiest to consistently narrow my possible outcome. This is why I sometimes lean on the !negation operator. It is very often more simple to prune uninteresting input away than it is to pointedly select the interesting kind - at least, this is my opinion on the matter.

I find this method more inline with sed's default behavior - which is to auto-print pattern-space at script's end. For simple things such as this it can also more easily result in a robust script - a script that does not depend on certain implementations' syntax extensions in order to operate (as is commonly seen with sed {functions}).

This is why I recommended you do:

sed '10,15!d;/pattern/!d;=' <input

...which first prunes any line not within the range of lines 10 & 15, and from among those that remain prunes any line which does not match pattern. If you find you'd rather have the line number sed prints on the same line as its matched line, I would probably look to paste in that case. Maybe...

sed '10,15!d;/pattern/!d;=' <input |
paste -sd:\\n -

...which will just alternate replacing input \newlines with either a : character or another \newline.

For example:

seq 150 |
sed '10,50!d;/5/!d;=' |
paste -sd:\\n -

...prints...

Find matches on adjacent lines

I'll use the same test file as thrig:

$ cat file
a
pat 1
pat 2
b
pat 3

Here is an awk solution:

$ awk '/pat/ && last {print last; print} {last=""} /pat/{last=$0}' file
pat 1
pat 2

How it works

awk implicitly loops over every line in the file. This program uses one variable, last, which contains the last line if it matched regex pat. Otherwise, it contains the empty string.

/pat/ && last {print last; print}

If pat matches this line and the previous line, last, was also a match, then print both lines.
{last=""}

Replace last with an empty string
/pat/ {last=$0}

If this line matches pat, then set last to this line. This way it will be available when we process the next line.

Alternative for treating >2 consecutive matches as one group

Let's consider this extended test file:

$ cat file2
a
pat 1
pat 2
b
pat 3
c
pat 4
pat 5
pat 6
d

Unlike the solution above, this code treats the three consecutive matching lines as one group to be printed:

$ awk '/pat/{f++; if (f==2) print last; if (f>=2) print; last=$0; next} {f=0}' file2
pat 1
pat 2
pat 4
pat 5
pat 6

This code uses two variables. As before, last is the previous line. In addition, f counts the number of consecutive matches. So, we print matching lines when f is 2 or larger.

Adding grep-like features

To emulate the grep output shown in the question, this version prints the filename and line number before each matching line:

$ awk 'FNR==1{f=0} /pat/{f++; if (f==2) printf "%s:%s:%s\n",FILENAME,FNR-1,last; if (f>=2) printf "%s:%s:%s\n",FILENAME,FNR,$0; last=$0; next} {f=0}' file file2
file:2:pat 1
file:3:pat 2
file2:2:pat 1
file2:3:pat 2
file2:7:pat 4
file2:8:pat 5
file2:9:pat 6

Awk's FILENAME variables provides the file's name and awk's FNR provides the line number within the file.

At the beginning of each file, FNR==1, we reset f to zero. This prevents the last line of one file from being considered consecutive with the first line of the next file.

For those who like their code spread over multiple lines, the above looks like:

awk '
    FNR==1{f=0}
    /pat/ {f++
        if (f==2) printf "%s:%s:%s\n",FILENAME,FNR-1,last
        if (f>=2) printf "%s:%s:%s\n",FILENAME,FNR,$0
        last=$0
        next
    }

    {f=0}
    ' file file2