Multi-line file shuffle

awkgrepsedtext processing

I have a text file with empty lines separating blocks of text. I would like to use *NIX command-line tools to shuffle this file while respecting the block structure. In other words, in the output I would like to see the changed order of blocks; the lines and their order inside the block remain the same.

Input file example:

line 1
line 2

line 10
line 20
line 30

line 100
line 200

The output file (after shuffle):

line 10
line 20
line 30

line 1
line 2

line 100
line 200

Of course, running repeatedly should give different order of blocks.

The first line of the file is always non-empty. There are no double blank lines. The last line of the file is always empty.

I wrote a very simple Python script that reads all lines in a list of lists, shuffles it and outputs. I am curious whether I could do it with standard *NIX tools.

Best Answer

POSIXly, you could do something like:

<file awk '
  BEGIN{srand(); n=rand()}
  {print n, NR, $0}
  !NF {n=rand()}
  END {if (NF) print n, NR+1, ""}' |
  sort -nk1 -k2 |
  cut -d' ' -f3-

That is, prefix each line with <a-random-number-that-changes-with-each-paragraph> then the line number, then sort numerically on the first number and then second to keep the line order in the paragraphs and remove those extra numbers.

One may want to pipe to sed '$d' to remove the trailing blank line.

Beware that with most awk implementations srand() uses the unix epoch time to seed the pseudo-random number generator, so you may get the same result if run twice in the same second (a historical bug now engraved in the POSIX spec, despite my efforts unfortunately).

Related Solutions

Shell – Print random lines respecting the order of the source file

Use this:

nl file | shuf -n2 | sort -n | cut -f2-

nl to number the lines,
shuf to shuffle and limit the output to 2 lines (-n),
sort to rebuild the original order,
and cut to remove the numeration of nl.

It will print 2 lines of your file in the original order of the file. Use shuf -n X, where X can be any number.

Shell – How to skip first, last non-blank line and blank lines from modification in a file

Here you go with awk and processing the file only once.

awk -F'|' 'NR==1{print;next} m && NF{print m}
    NF{l="\n"$0; $5=$4; m="\n"$0; c=0}; !NF{c++}
END{ print l; for (; i++<c;)print }' OFS='|' infile

Explanation:

Here we are skyping first line to being replace 5^th field's value with 4^th field's value, and just print it and do next.

... if it (current next line) was not empty line (at least contains one field NF), then take a backup of whole line with a \newline added l="\n"$0 first next set 5^th field's value with 4^th field's value $5=$4 and last set it to a variable m with a \newline added m="\n"$0;; There is a c variable as a counter flag and is used to determine the number of empty lines !NF{c++} if no line with at least one field seen; Otherwise c=0 will reset this counter.

Now we have modified line in m variable and m && NF{print m} will print it where in the next step awk runs and m has set and it's not on empty lines & NF (this is used to prevent duplication on printing when empty line).

At the end we are printing the untouched last line which we take backup every time before performing replacement END{ print l; ... and then number of empty lines which never seen a line with a field with looping for (; i++<c;)print }'.

That's much shorter if you don't need redundant empty lines.

awk -F'|' 'NR==1{print;next} m && NF{print m}
    NF{l=$0; $5=$4; m=$0} END{ print l}' OFS='|' infile

Best Answer

Related Solutions

Shell – Print random lines respecting the order of the source file

Shell – How to skip first, last non-blank line and blank lines from modification in a file

Related Question