Xargs – How to Get the Index of the Parallel Processor

parallelismxargs

Suppose I have two resources, named 0 and 1, that can only be accessed exclusively.

Is there any way to recover the "index" of the "parallel processor" that xargs launches in order to use it as a free mutual exclusion service? E.g., consider the following parallelized computation:

$ echo {1..8} | xargs -d " " -P 2 -I {} echo "consuming task {}"
consuming task 1
consuming task 2
consuming task 3
consuming task 4
consuming task 5
consuming task 6
consuming task 7
consuming task 8

My question is whether there exists a magic word, say index, where the output would look like

$ echo {1..8} | xargs -d " " -P 2 -I {} echo "consuming task {} with resource index"
consuming task 1 with resource 0
consuming task 2 with resource 1
consuming task 3 with resource 1
consuming task 4 with resource 1
consuming task 5 with resource 0
consuming task 6 with resource 1
consuming task 7 with resource 0
consuming task 8 with resource 0

where the only guarantee is that there is only ever at most one process using resource 0 and same for 1. Basically, I'd like to communicate this index down to the child process that would respect the rule to only use the resource it was told to.

Of course, it'd be preferable to extend this to more than two resources. Inspecting the docs, xargs probably can't do this. Is there a minimal equivalent solution? Using/cleaning files as fake locks is not preferable.

Best Answer

If you're using GNU xargs, there's --process-slot-var:

--process-slot-var=environment-variable-name
Set the environment variable environment-variable-name to a unique value in each running child process. Each value is a decimal integer. Values are reused once child processes exit. This can be used in a rudimentary load distribution scheme, for example.

So, for example:

~ echo {1..9} | xargs -n2 -P2 --process-slot-var=index sh -c 'echo "$index" "$@" "$$"' _
0 1 2 10475
1 3 4 10476
1 5 6 10477
0 7 8 10478
1 9 10479
Related Question