Using parallel to process unique input files to unique output files

gnu-parallelparallelismscripting

I have a shell scripting problem where I'm given a directory full of input files (each file containing many input lines), and I need to process them individually, redirecting each of their outputs to a unique file (aka, file_1.input needs to be captured in file_1.output, and so on).

Pre-parallel, I would just iterate over each file in the directory and perform my command, while doing some sort of timer/counting technique to not overwhelm the processors (assuming that each process had a constant runtime). However, I know that won't always be the case, so using a "parallel" like solution seems the best way to get shell script multi-threading without writing custom code.

While I have thought of some ways to whip up parallel to process each of these files (and allowing me to manage my cores efficiently), they all seem hacky. I have what I think is a pretty easy use case, so would prefer to keep it as clean as possible (and nothing in the parallel examples seem to jump out as being my problem.

Any help would be appreciated!

input directory example:

> ls -l input_files/
total 13355
location1.txt
location2.txt
location3.txt
location4.txt
location5.txt

Script:

> cat proces_script.sh
#!/bin/sh

customScript -c 33 -I -file [inputFile] -a -v 55 > [outputFile]

Update:
After reading Ole's answer below, I was able to put together the missing pieces for my own parallel implementation. While his answer is great, here is my addition research and notes I took:

Instead of running my full process, I figured to start with a proof of concept command to prove out his solution in my environment. See my two different implementations (and notes):

find /home/me/input_files -type f -name *.txt | parallel cat /home/me/input_files/{} '>' /home/me/output_files/{.}.out

Uses find (not ls, that can cause issues) to find all applicable files within my input files directory, and then redirects their contents to a separate directory and file. My issue from above was reading and redirecting (the actual script was simple), so replacing the script with cat was a fine proof of concept.

parallel cat '>' /home/me/output_files/{.}.out :::  /home/me/input_files/*

This second solution uses parallel's input variable paradigm to read the files in, however for a novice, this was much more confusing. For me, using find a and pipe met my needs just fine.

Best Answer

GNU Parallel is designed for this kind of tasks:

parallel customScript -c 33 -I -file {} -a -v 55 '>' {.}.output ::: *.input

or:

ls | parallel customScript -c 33 -I -file {} -a -v 55 '>' {.}.output

It will run one jobs per CPU core.

You can install GNU Parallel simply by:

wget https://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem

Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Related Question