I have a script dataProcessing.pl
that accepts a tab-delimited .txt
file and performs extensive processing tasks on the contained data. Multiple input files exist (file1.txt file2.txt file3.txt
) which are currently looped over as part of a bash script, that invokes perl during each iteration (i.e. input files are processed one at a time).
I wish however to run multiple instances of Perl (if possible), and process all input files simultaneously via xargs. I'm aware that you can run something akin to:
perl -e 'print "Test" x 100' | xargs -P 100
However I want to pass a different file for each parallel instance of Perl opened (one instance works on file1.txt, one works on file2.txt and so forth). File handle or file path can be passed to Perl as an argument. How can I do this? I am not sure how I would pass the file names to xargs for example.
Best Answer
Use
xargs
with-n 1
meaning "only pass one single argument to each invocation of the utility".Something like:
which assumes that the filenames don't contain literal newlines.
If you have GNU
xargs
, or an implementation ofxargs
that understands-0
(for reading nul-delimited arguments, which allows for filenames with newlines) and-r
(for not running the utility with empty argument list, whenfile*.txt
doesn't match anything andnullglob
is in effect), you may doNote that both of these variations may start up to 100 parallel instances of the script, which may not be what you want. You may want to limit it to a reasonable number related to the number of CPUs on your machine (or related to the total amount of available RAM divided by the expected memory usage per task, if it's memory bound).