How shall I perform multiline matching and substitution using awk

awkgawktext processing

In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example

line 1
li
ne 2

There is a line break between the second and the third lines and I should modify the file to be

line 1
line 2

To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:

$ awk 'BEGIN{RS="";}; { if (match($0, /[^[:digit:] ] *\n/)) print $0;} ' inputfile

To concatenate two lines separated by a line break, I am still wondering.

Thanks.

Best Answer

You could run something along the lines of

awk 'BEGIN{RS=SUBSEP; ORS="" } {print gensub(/([^0-9])\n/,"\\1","g",$0)}' ex
  • RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)
  • then do you favorite multiline transformation
Related Question