Multi-line file shuffle

awkgrepsedtext processing

I have a text file with empty lines separating blocks of text. I would like to use *NIX command-line tools to shuffle this file while respecting the block structure. In other words, in the output I would like to see the changed order of blocks; the lines and their order inside the block remain the same.

Input file example:

line 1
line 2

line 10
line 20
line 30

line 100
line 200

The output file (after shuffle):

line 10
line 20
line 30

line 1
line 2

line 100
line 200

Of course, running repeatedly should give different order of blocks.

The first line of the file is always non-empty. There are no double blank lines. The last line of the file is always empty.

I wrote a very simple Python script that reads all lines in a list of lists, shuffles it and outputs. I am curious whether I could do it with standard *NIX tools.

Best Answer

POSIXly, you could do something like:

<file awk '
  BEGIN{srand(); n=rand()}
  {print n, NR, $0}
  !NF {n=rand()}
  END {if (NF) print n, NR+1, ""}' |
  sort -nk1 -k2 |
  cut -d' ' -f3-

That is, prefix each line with <a-random-number-that-changes-with-each-paragraph> then the line number, then sort numerically on the first number and then second to keep the line order in the paragraphs and remove those extra numbers.

One may want to pipe to sed '$d' to remove the trailing blank line.

Beware that with most awk implementations srand() uses the unix epoch time to seed the pseudo-random number generator, so you may get the same result if run twice in the same second (a historical bug now engraved in the POSIX spec, despite my efforts unfortunately).

Related Question