Ubuntu – Extract a string from a line between positions given by a pattern in another line

command linetext processing

I'm looking to output the characters between two positions A and B that are specified by the preceding line. Per pair, the two lines are equal in length, but among pairs the lengths can vary. Is there an efficient way (huge file sizes) to do it with grep, sed, or awk?

Example file:

xxxxxxAxxxxxxBxxxxxx
1234567890MNOPQRSTUV
xxAxxxxxxxxxxxxxxBxxxxxx
1234567890MNOPQRSTUVWXYZ

I would like to obtain the output:

7890MNOP
34567890MNOPQRST

Best Answer

Using awk:

$ awk '!seen{match($0, /A.*B/);seen=1;next} {print substr($0,RSTART,RLENGTH);seen=0}' infile
7890MNOP
34567890MNOPQRST

Explanation: read in man awk:

RSTART
          The index of the first character matched by match(); 0 if no
          match.  (This implies that character indices start at one.)

RLENGTH
          The length of the string matched by match(); -1 if no match.

match(s, r [, a])  
          Return the position in s where the regular expression r occurs, 
          or 0 if r is not present, and set the values of RSTART and RLENGTH. (...)

substr(s, i [, n])
          Return the at most n-character substring of s starting at I.
          If n is omitted, use the rest of s.
Related Question