Egrep regular expression – same word in the beginning and end

grepregular expressiontext processing

I want to find all the lines that have the same word in the beginning and in the end of the line.

For example:

goodword         fgdlakj 3t sfkl 43lk fkl goodword
bad sfa;lk3t   dgk;gs    34;kl bad334
singleword

Desired output

goodword         fgdlakj 3t sfkl 43lk fkl goodword
singleword

My code is:

egrep "(^.+)([ ]+.*\1)$"

it does work if the line has more than 1 word. But I want a line containing a single word to match too.

So I tried:

egrep "(^.+)($|([ ]+.*\1)$)"

and it does not work anymore – and I don't know why.

Best Answer

I propose to use awk instead:

awk '$1==$NF' file

The advantage of this solution is that it is way simpler to read, and secondly you can easily change field separator (with -F option), so that eg. even the same number of spaces at the beginning and end of the line will match.

Related Question