Linux – grep with regex containing pipe character

bashcommand linegreplinuxregex

I am trying to grep with regex that contains pipe character |. However, It doesn't work as expected. The regex does not match the | inclusively as seen in the attach image below.

enter image description here

this is my bash command

cat data | grep "{{flag\|[a-z|A-Z\s]+}}"

the sample data are the following

| 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066
|{{flagicon|Kosovo}} ''[[Kosovo]]'' <ref name="KOS" group=Note>{{Kosovo-note}}</ref>
|{{flagicon|Somaliland}} [[Somaliland|Somaliland region]]
|{{flagicon|Palestine}} ''[[Palestinian Territories]]''{{refn|See the following on statehood criteria:

the expected output is

| 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066

However, having tested it with Regex101.com, the result came out as expected.

Best Answer

It appears that grep accepts \| as a separator between alternative search expressions (like | in egrep, where \| matches a literal |).

Apart from that, your expression has other problems:-

  • + is supported in egrep (or grep -E) only.
  • \s is not supported within a [] character group.
  • I don't see the need for | in the character group.

So the following works for grep:-

grep "{{flag|[a-zA-Z ][a-zA-Z ]*}}" <temp

Or (thanks to Glenn Jackman's input):-

grep "{{flag|[a-zA-Z ]\+}}" <temp

In egrep the {} characters have special significance, so they need to be escaped:-

egrep "\{\{flag\|[a-zA-Z ]+\}\}" <temp

Note that I have removed the unnecessary use of cat.

Related Question