Xargs – How to Get the Index of the Parallel Processor

parallelismxargs

Suppose I have two resources, named 0 and 1, that can only be accessed exclusively.

Is there any way to recover the "index" of the "parallel processor" that xargs launches in order to use it as a free mutual exclusion service? E.g., consider the following parallelized computation:

$ echo {1..8} | xargs -d " " -P 2 -I {} echo "consuming task {}"
consuming task 1
consuming task 2
consuming task 3
consuming task 4
consuming task 5
consuming task 6
consuming task 7
consuming task 8

My question is whether there exists a magic word, say index, where the output would look like

$ echo {1..8} | xargs -d " " -P 2 -I {} echo "consuming task {} with resource index"
consuming task 1 with resource 0
consuming task 2 with resource 1
consuming task 3 with resource 1
consuming task 4 with resource 1
consuming task 5 with resource 0
consuming task 6 with resource 1
consuming task 7 with resource 0
consuming task 8 with resource 0

where the only guarantee is that there is only ever at most one process using resource 0 and same for 1. Basically, I'd like to communicate this index down to the child process that would respect the rule to only use the resource it was told to.

Of course, it'd be preferable to extend this to more than two resources. Inspecting the docs, xargs probably can't do this. Is there a minimal equivalent solution? Using/cleaning files as fake locks is not preferable.

Best Answer

If you're using GNU xargs, there's --process-slot-var:

--process-slot-var=environment-variable-name
Set the environment variable environment-variable-name to a unique value in each running child process. Each value is a decimal integer. Values are reused once child processes exit. This can be used in a rudimentary load distribution scheme, for example.

So, for example:

~ echo {1..9} | xargs -n2 -P2 --process-slot-var=index sh -c 'echo "$index" "$@" "$$"' _
0 1 2 10475
1 3 4 10476
1 5 6 10477
0 7 8 10478
1 9 10479

Related Solutions

Bash – How to get the current line count from xargs

If the filenames does not contain whitespaces:

Dry run:

find *.jpg | xargs -n 5 | awk '{OFS=" ";}{print "convert",$1,$2,$3,$4,$5,"-append",NR".png\n";}'

If everything looks okay, append | sh.

Output garbled when running “xargs ls” in parallel

This is to do with writes to pipes. With -L16 you are running one process for each 16 files, which produces about a thousand characters, depending on how long the filenames are. With -L64 you are about four thousand. The ls program almost certainly uses the stdio library, and almost certainly uses a 4kB buffer for outputting to reduce the number of write calls.

So find produces a load of filenames, then (for the -L64 case) xargs chops them into bundles of 64 and starts up 4 ls processes to handle them. Each ls will generate its first 4k of output and write it to the pipe to sort. Note that this 4k will typically not end with a newline. So say the third ls gets its first 4kB ready first, and it ends

 lrwxrwxrwx 1 root root       6 Oct 21  2013 bzegrep -> bzgrep
 -rwxr-xr-x 1 root root    4877 Oct 21  2013 bzexe
 lrwxrwxrwx 1 root root       6 Oct 2

and then the first ls outputs something, e.g.

 total 123459

then the input to sort will include lrwxrwxrwx 1 root root 6 Oct 2total 123459

In the -L16 case, the ls processes will (usually) only output a complete set of results in one go.

Of course for this case you are just wasting time and resources by using xargs and ls, you should just let find output the information it already has rather than running extra programs to discover the information again.

Best Answer

Related Solutions

Bash – How to get the current line count from xargs

Output garbled when running “xargs ls” in parallel

Related Question