Grep command to display all lines that begins and ends with same character

greptext processing

I want to know how to use grep in order to display all lines that begin and end with the same character.

Best Answer

POSIXly:

pattern='\(.\).*\1
.'
grep -x -- "$pattern" file

It won't work if line starts or ends with invalid byte character, if you want to cover that case, you can add LC_ALL=C, although LC_ALL=C works with single-byte character data only.


perl6 seems to be the best tool, if you have it in your box:

$ printf '\ue7\u301 blah \u107\u327\n121\n1\n123\n' |
  perl6 -ne '.say if m/^(.).*$0$/ || /^.$/'
ḉ blah ḉ
121
1

Although it still chokes on invalid characters.


Note that perl6 will alter your text by turning it to NFC form:

$ printf '\u0044\u0323\u0307\n' |
  perl6 -pe ''                  |
  perl -CI -ne 'printf "U+%04x\n", ord for split //'
U+1e0c
U+0307
U+000a

$ printf '\u0044\u0323\u0307\n' |
  perl -pe ''                   |
  perl -CI -ne 'printf "U+%04x\n", ord for split //'
U+0044
U+0323
U+0307
U+000a

Internally, perl6 stores string in NFG form (stand for Normalization Form Grapheme), which is perl6 invented way to deal with un-precomposed graphemes properly:

$ printf '\u0044\u0323\u0307\n' | perl6 -ne '.chars.say'
1
$ printf '\u0044\u0323\u0307\n' | perl6 -ne '.codes.say'
2