Ubuntu – Use a list of words to grep in an other list

bashgrep

I've got a list with 250 lines in it. I have to run all of them through a web server to get a list of output. This list, however returns many more lines, than I'm interested in. Say, my list.txt is:

a.1
b.1
etc

then the output is output.txt:

a.1 a b c
a.2 b a b
a.3 d k o
b.1 b o p
b.2 o i y
b.3 p i y
etc

Is it possible to use the grep command to search for all words in list.txt in the output.txt and then generate "the wanted" list wanted.txt? I need the entire line in my output.txt
I'm new in scripting, but what I'd like is something such as

grep list.txt output.txt > wanted.txt

I haven't been able to find any examples of this

Best Answer

I'd ignore grep for this one. It's good for regular expressions but it doesn't look like you really need that here. comm can compare two files and show you intersections. Using your exact examples:

$ comm -12 list.txt output.txt 
a.1
b.1
etc

This is faster than any grep will be but it relies (heavily) on the files being sorted. If they aren't, you can pre-sort them but that will alter the output so it's sorted too.

comm -12 <(sort list.txt) <(sort output.txt)

Alternatively, this answer from iiSeymour will let you do it with grep. The flags ask for an input file and force a fixed-string, full-word search. This won't rely on order but will be based on the output.txt order. Reverse the files if you want them in the order of the list.txt.

$ grep -wFf list.txt output.txt 
a.1
b.1
etc

If your list.txt is really big, you might have to tackle this a little more iteratively and pass each line to grep separately. This will massively increase processing time. In the above you'd be reading output.txt once, but this way you'd read and process it for every list.txt line. It's horrible... But it might be your only choice. On the upside, it does then sort things by the list.txt order.

$ while read line; do grep -wF "$line" output.txt; done < list.txt
a.1
b.1
etc

Related Solutions

Ubuntu – How to use grep on all files non-recursively in a directory

In Bash, a glob will not expand into hidden files, so if you want to search all the files in a directory, you need to specify hidden files .* and non-hidden *.

To avoid the "Is a directory" errors, you could use -d skip, but on my system I also get an error grep: .gvfs: Permission denied^†, so I suggest using -s, which hides all error messages.

So the command you are looking for is:

grep -s "string" * .*

If you are searching files in another dir:

grep -s "string" /path/to/dir/{*,.*}

Another option is to use the dotglob shell option, which will make a glob include hidden files.

shopt -s dotglob
grep -s "string" *

For files in another dir:

grep -s "string" /path/to/dir/*

† Someone mentioned that I shouldn't get this error. They may be right - I did some reading but couldn't make heads or tails of it myself.

Ubuntu – How to include a space character with grep

Make sure you quote your expression.

$ grep ' \.pdf' example
grep .pdf

Or if there might be multiple spaces (we can't use * as this will match the cases where there are no preceding spaces)

grep ' \+\.pdf' example

+ means "one or more of the preceding character". In BRE you need to escape it with \ to get this special function, but you can use ERE instead to avoid this

grep -E ' +\.pdf' example

You can also use \s in grep to mean a space

grep '\s\+\.pdf' example

We should escape literal . because in regex . means any character, unless it's in a character class.

Best Answer

Related Solutions

Ubuntu – How to use grep on all files non-recursively in a directory

Ubuntu – How to include a space character with grep

Related Question