GNU parallel excessively slow

gnu-parallelgrepxargs

I need to run grep on a couple of million files. Therefore I tried to speed it up, following the two approaches mentioned here: xargs -P -n and GNU parallel. I tried this on a subset of my files (9026 in number), and this was the result:

  1. With xargs -P 8 -n 1000, very fast:

    $ time find tex -maxdepth 1 -name "*.json" | \
                    xargs -P 8 -n 1000 grep -ohP "'pattern'" > /dev/null
    
    real    0m0.085s
    user    0m0.333s
    sys     0m0.058s
    
  2. With parallel, very slow:

    $ time find tex -maxdepth 1 -name "*.json" | \
                    parallel -j 8 grep -ohP "'pattern'" > /dev/null
    
    real    0m21.566s
    user    0m22.021s
    sys     0m18.505s
    
  3. Even sequential xargs is faster than parallel:

    $ time find tex -maxdepth 1 -name "*.json" | \
                    xargs grep -ohP 'pattern' > /dev/null
    
    real    0m0.242s
    user    0m0.209s
    sys     0m0.040s
    

xargs -P n does not work for me because the output from all the processes gets interleaved, which does not happen with parallel. So I would like to use parallel without incurring this huge slowdown.

Any ideas?

UPDATE

  1. Following the answer by Ole Tange, I tried parallel -X, the results are here, for completeness:

    $ time find tex -maxdepth 1 -name "*.json" | \
        parallel -X -j 8 grep -ohP "'pattern'" > /dev/null
    
    real    0m0.563s
    user    0m0.583s
    sys     0m0.110s
    
  2. Fastest solution: Following the comment by @cas, I tried to grep with -H option (to force printing the filenames), and sorting. Results here:

    time find tex -maxdepth 1 -name '*.json' -print0 | \
        xargs -0r -P 9 -n 500 grep --line-buffered -oHP 'pattern' | \
        sort -t: -k1 | cut -d: -f2- > /dev/null
    
    real    0m0.144s
    user    0m0.417s
    sys     0m0.095s
    

Best Answer

Try parallel -X. As written in the comments the overhead of starting a new shell and opening files for buffering for each argument is probably the cause.

Be aware that GNU Parallel will never be as fast as xargs because of that. Expect an overhead of 10 ms per job. With -X this overhead is less significant as you process more arguments in one job.

Related Question