Returning Randomized Items from Glob Match in ZSH

randomwildcardszsh

With the glob qualifiers in zsh one can sort the results of a filename globbing pattern match in various ways. For example, the pattern *(om) would match all non-hidden names in the current directory, ordered by modification timestamp.

However, I have at times wanted to have a way of having a randomised ordering (for example, to get a random sampling of files). As far as I have seen, there is no qualifier that does this directly.

Question: How may I get a randomised list of pathnames back from a zsh filename globbing pattern?

Best Answer

Use a random sort key (glob qualifier oe)::

*(Noe\''REPLY=$RANDOM,$RANDOM'\')

Explanation:

  • oe is followed by a one-character delimiter, a chunk of code, and another delimiter. The chunk of code may not contain the delimiter. Special characters need to be escaped so that they are not parsed while parsing the glob qualifiers themselves.
  • I use ' as the delimiter character (with a backslash because it needs to be escaped), and I wrap the code with ' to protect special characters that may be present. This way I can write arbitrary code as long as it doesn't contain '.
  • This chunk of code is executed for each matching file name in turn.
  • REPLY is initially set to the file name and whatever the code sets REPLY to is used as a sort key).

To sample $n elements randomly, add the […] qualifier:

*(Noe\''REPLY=$RANDOM,$RANDOM'\'[1,$n])

Occasionally some elements will get the same sort key, so all permutations are not equally likely, with a slight preference for keeping whatever results from applying the sort function to a list in directory order¹, but the bias is small. I use $RANDOM,$RANDOM as the sort key rather than $RANDOM to reduce the bias: $RANDOM is a 15-bit number and the bias would be noticeable as the number of files approaches 2^15.

Note that $RANDOM is good enough for sampling if the slight bias isn't a concern. It isn't suitable for anything that involves security. If you want a secure random permutation, use GNU coreutils's shuf. (If your favorite OS lacks a native shuf and you don't want to install GNU coreutils for some reason, you can try ibara's reimplementation instead.)

securely_permuted=("${(0)$(printf '%s\0' *(N) | shuf -z))}")

or a simpler version that may run into a command line length limit:

securely_permuted=("${(0)$(shuf -z -- *(N)))}")

¹ Experimentally the sort is stable (e.g. *(omoe\''REPLY=1'\') is equivalent to *(om), but the order from just *(oe\''REPLY=1'\') doesn't match *(oN). In any case, it's a small bias in favor of some particular order.

Related Question