Shell – How to use GNU parallel to calculate sha256 hash

gnu-parallelhashsumshell-script

Based on this:
Simultaneously calculate multiple digests (md5, sha256)?

I have a folder that has a large number of files that I want to compute the SHA256 hash for.

I used to code segment:

#!/bin/bash
for file in *; do
sha256sum "$file" > "$file".sha &
done

currently to compute the sha256 hash in parallel, except that my computer only has 16 physical cores.

So, the question that I have is how can I use GNU parallel to run this, but only run using the 16 physical cores that I have available on my system and that once a hash has been completed, it will automatically pick up the next file to hash?

Best Answer

Using xargs (and assuming that you have an implementation of this utility that supports -0 and -P):

printf '%s\0' * | xargs -0 -L 1 -P 16 sh -c 'sha256sum "$1" > "$1".sha' sh

This would pass all names in the current directory as a nul-terminated list to xargs. The xargs utility would call an in-line sh script for each one of these names, starting at most 16 concurrent processes. The in-line script takes the argument and runs sha256sum on it, outputting the result to a file of a similar name.

Note that this would also possibly pick up .sha files created in a previous run of the same pipeline. To avoid this, use a slightly more sophisticated glob than * to match the particular names that you'd want to process. For example, in bash:

shopt -s extglob
printf '%s\0' !(*.sha) | xargs ...as above...

Note also that running sha256sum on large files in parallel is likely to be disk bound rather than CPU bound and that you may possibly see similar speed of operation with a smaller number of parallel tasks.


For a GNU parallel equivalent, replace xargs with parallel.


In the zsh shell, you can do it like

autoload -U zargs
setopt EXTENDED_GLOB

zargs -P 16 -L 1 -- (^(*.sha)) -- sh -c 'sha256sum "$1" > "$1".sha' sh
Related Question