How can I manipulate a two columns tab-separated text file by sorting by the first element of the second column (only if the first column element is the same)?
Exemple:
Input File 1
A 1-2
A 6-8
A 3-4
B 7-10
B 5-9
Expected output: File 2
A 1-2
A 3-4
A 6-8
B 5-9
B 7-10
Best Answer
Use
sort
's-k
option to sort by (multiple) columns at once:-k1,1
sorts by the first column first, then-k2n
by the second¹ numerically when the first column was tied, so you get your output in the order you want: sorting by the first element of the second column, only if the first column element is the same.When sorting numerically it will only examine the field until it stops being a number, so that gives you a comparison of just the first element of it.
When the two keys compare the same, then
sort
compares the full lines lexically as a last resort comparison. For instance inA 1-10
vsA 1-2
, the first keys are identical (A
string), and the second key as well (both are treated as the number1
), so thensort
comparesA 1-10
vsA 1-2
lexically and the latter is greater as2
sorts after1
. The GNU implementation ofsort
has a-V
option orV
key flag to perform a version sort, which is like a lexical comparison except that sequences of decimal digits within the strings are compared numerically, sosort -k1,1 -k2V
would sortA 1-10
afterA 1-2
because10
as a number is greater than2
.¹ technically,
-k2
means the portion of the line starting with the second field (after the first transition from a non-blank to a blank) and ending at the end of the line, but with then
flag, that's equivalent to-k2,2n
as only the leading part that constitutes a number is considered.