Shell – Find files that have words in common

shellshell-scripttext processing

What would be the best way to create a list of files that have common words with a given file. For example, if I had:

$ ls
  mainFile  file1  file2  file file4
$ cat mainFile
  exquisite malicious sentient pulsating
  perspicacious one
  tawdry fumigate Baryshnikov O'connor

and I wanted to list any of the files in the cwd that contained any one of the words in mainFile. What would be the best way to go about this?

Since the number of words per line in mainFile is not constant, I was finding solutions using cut a little tricky. I was trying to create a string out of the words and then place them separated by | in a grep -l "exquisite|malicious|etc" * command. I'm open to any method though that might be better.

Best Answer

First generate indices for mainFile,

sed 's/ /\n/g' mainFile | sort | uniq > mainFile.idx

Then do a grep for fixed strings:

grep -F -f mainFile.idx file*

Related Question