Extract lines that match a list of words in another file

awkbioinformaticsgrepsed

I have file 1 which have those lines:

ATM 1434.972183
BMPR2 10762.78192
BMPR2 10762.78192
BMPR2 1469.14535
BMPR2 1469.14535
BMPR2 1738.479639
BMS1 4907.841667
BMS1 4907.841667
BMS1 880.4532628
BMS1 880.4532628
BMS1P17 1249.75
BMS1P17 1249.75
BMS1P17 1606.821429
BMS1P17 1606.821429
BMS1P17 1666.333333
BMS1P17 1666.333333
BMS1P17 2108.460317
BMS1P17 2108

And file 2 have a list of words:

ATM
BMS1

So, the output will be like this:

ATM 1434.972183
BMS1 4907.841667
BMS1 4907.841667
BMS1 880.4532628
BMS1 880.4532628

I know it's really a duplicate question, but I tried all types of grep and sed and awk, maybe it will works with you guys with this tiny example
but I have a very huge file > 1M lines and all previous way doesn't help

it return part of the lines that containing those words although there are other words in file 2 that matches the lines from file 1

Best Answer

grep -Fw -f words myfile

This would extract the lines in myfile that contains the words in the file words anywhere.

The strings in words are treated as fixed strings (not regular expressions) due to the -F option, and the -w option ensures that we only get lines that contains the exact same word (no matches of substrings in words are allowed). A word is a consecutive sequence of characters from the set of alphanumerical characters and the underscore character.

The words in the file words most be listed on separate lines.

Related Question