Grep – How to Match Pattern Exactly from File and Search Only in First Column

command linegrepregular expressionshell

I have a bigfile like this:

denovo1 xxx yyyy oggugu ddddd
denovo11 ggg hhhh bbbb gggg
denovo22 hhhh yyyy kkkk iiii
denovo2 yyyyy rrrr fffff jjjj
denovo33 hhh yyy eeeee fffff

then my pattern file is:

denovo1
denovo3
denovo22

I'm trying to use fgrep in order to extract only the lines exactly matching the pattern in my file (so I want denovo1 but not denovo11).
I tried to use -x for the exact match, but then I got an empty file.
I tried:

fgrep -x --file="pattern" bigfile.txt > clusters.blast.uniq

Is there a way to make grep searching only in the first column?

Best Answer

You probably want the -wflag - from man grep

   -w, --word-regexp
          Select  only  those  lines  containing  matches  that form whole
          words.  The test is that the matching substring must  either  be
          at  the  beginning  of  the  line,  or  preceded  by  a non-word
          constituent character.  Similarly, it must be either at the  end
          of  the  line  or  followed by a non-word constituent character.
          Word-constituent  characters  are  letters,  digits,   and   the
          underscore.

i.e.

grep -wFf patfile file
denovo1 xxx yyyy oggugu ddddd
denovo22 hhhh yyyy kkkk iiii

To enforce matching only in the first column, you would need to modify the entries in the pattern file to add a line anchor: you could also make use of the \b word anchor instead of the command-line -w switch e.g. in patfile:

^denovo1\b
^denovo3\b
^denovo22\b

then

grep -f patfile file
denovo1 xxx yyyy oggugu ddddd
denovo22 hhhh yyyy kkkk iiii

Note that you must drop the -F switch if the file contains regular expressions instead of simple fixed strings.

Related Question