Quite new to doing things on Unix, looking to make a script that does the following things in order:
- Take main .tsv file, split into X number of files with Y lines each
- Run each split file through a program, which outputs a new .tsv file upon completion
- Wait until ALL split files have completed processing, then stitch output files together into one.
I know about using split
and sed
for splitting files, and I can't imagine getting the split files to run through a Python script is hard either, but the problem is finding out when all executions of the parallelized programs are complete, and THEN stitching their outputs together into one.
With split
I know it auto-increments the names and that you can mass parallelize it as seen in this SO question, so I could figure that part out. Is there a way to check for a group of parallelized Python scripts' execution status? How could I accomplish what I'd like to do?
Best Answer
wait
is a bash builtin: check the man page for details