Multiline grepping – What’s wrong with this expression

grepregular expression

Consider this sample file (line numbers are for reference only):

1 Reference duiarneutdigane uditraenturida enudtiar.
2
3 Reference uiae uiaetrtdnsu iatdne uiatrdenu diaren uidtae
4 on line 23.
5
6 uiae
7
8 uaiernd Reference uriadne udtiraeb unledut iaeru uilaedr
9 uiarnde line 234.

I was hoping to match every string beginning with “Reference” and ending with a period (i.e. ll. 1, 3–4, and 8–9) using this grep command (tst is the sample file):

grep -P '(?s)Reference.*?\.' tst

However, it only matches the first line. What I was thinking:

  • (?s), so . matches all characters, including newlines
  • .*? should make the star non-greedy, so it doesn’t match the whole file if it ends with a period.
  • The expression should end with a literal period \..

I’ve also tried awk and grep’s -z flag, but with both I get either every line or not all lines match my expressions.

Best Answer

You can use this:

grep -Pzo '(?s)Reference.*?\.' tst.txt

where tst.txt is your input file. It is the same regex as yours, but with two new flags.

I added the -z flag to suppress newline at the end of line, substituting it for null character. Thus grep knows where end of line is, but sees the input as one big line.

The -o flag means that it only prints the matched part.

I got the following output:

Reference duiarneutdigane uditraenturida enudtiar.
Reference uiae uiaetrtdnsu iatdne uiatrdenu diaren uidtae
on line 23.
Reference uriadne udtiraeb unledut iaeru uilaedr
uiarnde line 234.
Related Question