Ubuntu – Use a list of words to grep in an other list

bashgrep

I've got a list with 250 lines in it. I have to run all of them through a web server to get a list of output. This list, however returns many more lines, than I'm interested in. Say, my list.txt is:

a.1
b.1
etc

then the output is output.txt:

a.1 a b c
a.2 b a b
a.3 d k o
b.1 b o p
b.2 o i y
b.3 p i y
etc

Is it possible to use the grep command to search for all words in list.txt in the output.txt and then generate "the wanted" list wanted.txt? I need the entire line in my output.txt
I'm new in scripting, but what I'd like is something such as

grep list.txt output.txt > wanted.txt

I haven't been able to find any examples of this

Best Answer

I'd ignore grep for this one. It's good for regular expressions but it doesn't look like you really need that here. comm can compare two files and show you intersections. Using your exact examples:

$ comm -12 list.txt output.txt 
a.1
b.1
etc

This is faster than any grep will be but it relies (heavily) on the files being sorted. If they aren't, you can pre-sort them but that will alter the output so it's sorted too.

comm -12 <(sort list.txt) <(sort output.txt) 

Alternatively, this answer from iiSeymour will let you do it with grep. The flags ask for an input file and force a fixed-string, full-word search. This won't rely on order but will be based on the output.txt order. Reverse the files if you want them in the order of the list.txt.

$ grep -wFf list.txt output.txt 
a.1
b.1
etc

If your list.txt is really big, you might have to tackle this a little more iteratively and pass each line to grep separately. This will massively increase processing time. In the above you'd be reading output.txt once, but this way you'd read and process it for every list.txt line. It's horrible... But it might be your only choice. On the upside, it does then sort things by the list.txt order.

$ while read line; do grep -wF "$line" output.txt; done < list.txt
a.1
b.1
etc
Related Question