xargs – How to Preserve Order of Outputs of Commands Executed in Parallel

xargs

When I run a command with xargs -n 1 -P 0 for parallel execution, the output is all jumbled. Is there a way to do parallel execution, but make sure that the entire output of the first execution is written to stdout before the output of the second execution starts, the entire output of the second execution is written to stdout before the output of the third execution starts, etc.?

For example, when wanting to hash many files containing a lot of data, it can be done like this:

printf "%s\0" * | xargs -r0 -n 1 -P 0 sha256sum

I tested this on a small amount of data (9 GB) and it was done in 5.7 seconds. Hashing the same data using

sha256sum *

took 34.1 seconds. I often need to hash large amounts of data (which can take hours), so processing this in parallel can get things done a lot faster.

The problem here is that the order of the output lines is wrong. In this case, it can be fixed by simply sorting the lines by the second column. But it's not always this easy. For example, this would already break while sticking to the hashing example from above but wanting to hash numbered files in order:

printf "%s\0" {1..10000} | xargs -r0 -n 1 -P 0 sha256sum

This requires more advanced sorting. If we leave hashing example altogether, things get more complicated still.

In the comments, I was asked whether I merely want to prevent interleaving of output. This is not the case. I want order to be preserved.

Best Answer

You can do it with GNU Parallel (--keep-order):

printf "%s\0" {1..10000} | parallel --keep-order -r0 -n 1 -P 0 sha256sum
Related Question