Sort Command: How to Sort Numerical Column

numeric datasort

I'm trying to sort a file based on a particular position but that does not work, here is the data and output.

~/scratch$ cat  id_researchers_2018_sample 
id - 884209 , researchers - 1
id - 896781 , researchers - 4
id - 901026 , researchers - 15
id - 904091 , researchers - 1
id - 905525 , researchers - 1
id - 908660 , researchers - 5
id - 908876 , researchers - 7
id - 910480 , researchers - 10
id - 916197 , researchers - 1
~/scratch$ sort  -k 28,5 id_researchers_2018_sample 
id - 884209 , researchers - 1
id - 896781 , researchers - 4
id - 901026 , researchers - 15
id - 904091 , researchers - 1
id - 905525 , researchers - 1
id - 908660 , researchers - 5
id - 908876 , researchers - 7
id - 910480 , researchers - 10
id - 916197 , researchers - 1

I'm wanting to sort this by the numbers in the last column, like this:

id - 884209 , researchers - 1
id - 904091 , researchers - 1
id - 905525 , researchers - 1
id - 916197 , researchers - 1
id - 896781 , researchers - 4
id - 908660 , researchers - 5
id - 908876 , researchers - 7
id - 910480 , researchers - 10
id - 901026 , researchers - 15

Best Answer

You are intending to sort by column 7 numerically.

This can be done with either

$ sort -n -k 7 file
id - 884209 , researchers - 1
id - 904091 , researchers - 1
id - 905525 , researchers - 1
id - 916197 , researchers - 1
id - 896781 , researchers - 4
id - 908660 , researchers - 5
id - 908876 , researchers - 7
id - 910480 , researchers - 10
id - 901026 , researchers - 15

or with

$ sort -k 7n file
id - 884209 , researchers - 1
id - 904091 , researchers - 1
id - 905525 , researchers - 1
id - 916197 , researchers - 1
id - 896781 , researchers - 4
id - 908660 , researchers - 5
id - 908876 , researchers - 7
id - 910480 , researchers - 10
id - 901026 , researchers - 15

These are equivalent.

The -n option specifies numerical sorting (as opposed to lexicographical sorting). In the second example above, the n is added as a specifier/modifier to the 7th column specifically.

The specification of the sorting key column, -k 7, will make sort sort the lines on column 7 onwards (the line from column 7 to the end). In this case, since column 7 is last, it mean just this column. If this had mattered, you may have wanted to use -k 7,7 instead ("from column 7 to 7").

If two keys compare equal, sort will use the complete line as the sorting key, which is why we got the result we get for the first four lines in your example. If you had wanted to do a secondary sort on the second column, you would have used sort -n -k 7,7 -k 2,2, or sort -k 7,7n -k 2,2n (specifying the type of comparison separately for each column). Again, if the 7th and the 2nd columns compare the same between two lines, sort would have used a lexicographical comparison of the complete lines.


To sort numerically on character position 29, which corresponds to the first digit of the numerical values at the end of each line in your example data:

$ sort -k 1.29n file
id - 884209 , researchers - 1
id - 904091 , researchers - 1
id - 905525 , researchers - 1
id - 916197 , researchers - 1
id - 896781 , researchers - 4
id - 908660 , researchers - 5
id - 908876 , researchers - 7
id - 910480 , researchers - 10
id - 901026 , researchers - 15

The -k 1.29n means "sort on the key given by the 29th character of the 1st field (onwards, to the end of the line), numerically".

The -k 7,7n used in the text above just happens to be equivalent to -k 7.1,7.1n.