How to run parallel processes and combine outputs when both finished

parallelismscripting

I have a bash shell script in which I pipe some data through about 5 or 6 different programs then the final results into a tab delimited file.

I then do the same again for a separate similar dataset and output to a second file.

Then both files are input into another program for comparative analysis.
e.g. to simplify

Data1 | this | that |theother | grep |sed | awk |whatever > Data1Res.csv
Data2 | this | that |theother | grep |sed | awk |whatever > Data2Res.csv
AnalysisProg -i Data1res.csv Data2res.csv

My question is : how can I make step1 and step2 run at the same time (e.g. using &) but only launch step3 (AnalysisProg) when both are complete?

thx

ps AnalysisProg will not work on a stream or fifo.

Best Answer

Use wait. For example:

Data1 ... > Data1Res.csv &
Data2 ... > Data2Res.csv &
wait
AnalysisProg

will:

  • run the Data1 and Data2 pipes as background jobs
  • wait for them both to finish
  • run AnalysisProg.

See, e.g., this question.