I have a words.txt with 10000 words (one to a line). I have 5,000 documents. I want to see which documents contain which of those words (with a regex pattern around the word). I have a script.sh that greps the documents and outputs hits. I want to (1) split my input file into smaller files (2) feed each of the files to script.sh as a parameter and (3) run all of this in parallel.
My attempt based on the tutorial is hitting errors
$parallel ./script.sh ::: split words.txt # ./script.sh: line 22: split: No such file or directory
My script.sh looks like this
#!/usr/bin/env bash
line 1 while read line
line 2 do
some stuff
line 22 done < $1
I guess I could output split to a directory loop thru the files in the directory launching grep commands — but how can do this elegantly and concisely (using parallel)?
Best Answer
You can use the
split
tool:will split your
words.txt
file into files with no more than 1000 lines each namedIf you omit the prefix (
words-
in the above example),split
usesx
as the default prefix.For using the generated files with
parallel
you can make use of a glob: