Sorting file by first and then second column

sort

How can I manipulate a two columns tab-separated text file by sorting by the first element of the second column (only if the first column element is the same)?

Exemple:

Input File 1

A   1-2
A   6-8
A   3-4
B   7-10
B   5-9

Expected output: File 2

A   1-2
A   3-4
A   6-8
B   5-9
B   7-10

Best Answer

Use sort's -k option to sort by (multiple) columns at once:

$ sort -k1,1 -k2n input
A   1-2
A   3-4
A   6-8
B   5-9
B   7-10

-k1,1 sorts by the first column first, then -k2n by the second¹ numerically when the first column was tied, so you get your output in the order you want: sorting by the first element of the second column, only if the first column element is the same.

When sorting numerically it will only examine the field until it stops being a number, so that gives you a comparison of just the first element of it.

When the two keys compare the same, then sort compares the full lines lexically as a last resort comparison. For instance in A 1-10 vs A 1-2, the first keys are identical (A string), and the second key as well (both are treated as the number 1), so then sort compares A 1-10 vs A 1-2 lexically and the latter is greater as 2 sorts after 1. The GNU implementation of sort has a -V option or V key flag to perform a version sort, which is like a lexical comparison except that sequences of decimal digits within the strings are compared numerically, so sort -k1,1 -k2V would sort A 1-10 after A 1-2 because 10 as a number is greater than 2.


¹ technically, -k2 means the portion of the line starting with the second field (after the first transition from a non-blank to a blank) and ending at the end of the line, but with the n flag, that's equivalent to -k2,2n as only the leading part that constitutes a number is considered.

Related Question