Linux – Sort text files with multiple lines as a row

linuxsorttext processing

I have a text file in this format:

####################################
KEY2
VAL21
VAL22
VAL23
VAL24
####################################
KEY1
VAL11
VAL12
VAL13
VAL14
####################################
KEY3
VAL31
VAL32
VAL33
VAL34

I want sort this file by KEY line and keep next 4 lines with it in result so sorted result should be:

####################################
KEY1
VAL11
VAL12
VAL13
VAL14
####################################
KEY2
VAL21
VAL22
VAL23
VAL24
####################################
KEY3
VAL31
VAL32
VAL33
VAL34

is there a way to do this ?

Best Answer

msort(1) was designed to be able to sort files with multi-line records. It has an optional gui, as well as a normal and usable-for-humans command line version. (At least, humans that like to read manuals carefully and look for examples...)

AFAICT, you can't use an arbitrary pattern for records, so unless your records are fixed-size (in bytes, not characters or lines). msort does have a -b option for records that are blocks of lines separated by blank lines.

You can transform your input into a format that will work with -b pretty easily, by putting a blank line before every ###... (except the first one).

By default, it prints statistics on stderr, so at least it's easy to tell when it didn't sort because it thought the entire input was a single record.


msort works on your data. The sed command prepends a newline to every #+ line except for line 1. -w sorts the whole record (lexicographically). There are options for picking what part of a record to use as a key, but I didn't need them.

I also left out stripping the extra newlines.

$ sed '2,$ s/^#\+/\n&/' unsorted.records | msort -b -w 2>/dev/null 
####################################
KEY1
VAL11
VAL12
VAL13
VAL14

####################################
KEY2
VAL21
VAL22
VAL23
VAL24

####################################
KEY3
VAL31
VAL32
VAL33
VAL34

I didn't have any luck with -r '#' to use that as the record separator. It thought the whole file was one record.

Related Question