Based on this:
Simultaneously calculate multiple digests (md5, sha256)?
I have a folder that has a large number of files that I want to compute the SHA256 hash for.
I used to code segment:
#!/bin/bash
for file in *; do
sha256sum "$file" > "$file".sha &
done
currently to compute the sha256 hash in parallel, except that my computer only has 16 physical cores.
So, the question that I have is how can I use GNU parallel to run this, but only run using the 16 physical cores that I have available on my system and that once a hash has been completed, it will automatically pick up the next file to hash?
Best Answer
Using
xargs
(and assuming that you have an implementation of this utility that supports-0
and-P
):This would pass all names in the current directory as a nul-terminated list to
xargs
. Thexargs
utility would call an in-linesh
script for each one of these names, starting at most 16 concurrent processes. The in-line script takes the argument and runssha256sum
on it, outputting the result to a file of a similar name.Note that this would also possibly pick up
.sha
files created in a previous run of the same pipeline. To avoid this, use a slightly more sophisticated glob than*
to match the particular names that you'd want to process. For example, inbash
:Note also that running
sha256sum
on large files in parallel is likely to be disk bound rather than CPU bound and that you may possibly see similar speed of operation with a smaller number of parallel tasks.For a GNU
parallel
equivalent, replacexargs
withparallel
.In the
zsh
shell, you can do it like