I am trying to sort on multiple columns. The results are not as expected.
Here's my data (people.txt):
Simon Strange 62
Pete Brown 37
Mark Brown 46
Stefan Heinz 52
Tony Bedford 50
John Strange 51
Fred Bloggs 22
James Bedford 21
Emily Bedford 18
Ana Villamor 44
Alice Villamor 50
Francis Chepstow 56
The following works correctly:
bash-3.2$ sort -k2 -k3 <people.txt
Emily Bedford 18
James Bedford 21
Tony Bedford 50
Fred Bloggs 22
Pete Brown 37
Mark Brown 46
Francis Chepstow 56
Stefan Heinz 52
John Strange 51
Simon Strange 62
Ana Villamor 44
Alice Villamor 50
But, the following does not work as expected:
bash-3.2$ sort -k2 -k1 <people.txt
Emily Bedford 18
James Bedford 21
Tony Bedford 50
Fred Bloggs 22
Pete Brown 37
Mark Brown 46
Francis Chepstow 56
Stefan Heinz 52
John Strange 51
Simon Strange 62
Ana Villamor 44
Alice Villamor 50
I was trying to sort by surname and then by first name, but you will see the Villamors are not in the correct order. I was hoping to sort by surname, and then when surnames matched, to sort by first name.
It seems there is something about how this should work I don't understand. I could do this another way of course (using awk), but I want to understand sort.
I am using the standard Bash shell on Mac OS X.
Best Answer
A key specification like
-k2
means to take all the fields from 2 to the end of the line into account. SoVillamor 44
ends up beforeVillamor 50
. Since these two are not equal, the first comparison insort -k2 -k1
is enough to discriminate these two lines, and the second sort key-k1
is not invoked. If the two Villamors had had the same age,-k1
would have caused them to be sorted by first name.To sort by a single column, use
-k2,2
as the key specification. This means to use the fields from #2 to #2, i.e. only the second field.sort -k2 -k3 <people.txt
is redundant: it's equivalent tosort -k2 <people.txt
. To sort by last names, then first names, then age, run the following command:or equivalently
sort -k2,2 -k1 <people.txt
since there are only these three fields and the separators are the same. In fact, you will get the same effect fromsort -k2,2 <people.txt
, becausesort
uses the whole line as a last resort when all the keys in a subset of lines are identical.Also note that the default field separator is the transition between a non-blank and a blank, so the keys will include the leading blanks (in your example, for the first line, the first key will be
"Emily"
, but the second key" Bedford"
. Add the-b
option to strip those blanks:It can also be done on a per-key basis by adding the
b
flag at the end of the key start specification:But something to bear in mind: as soon as you add one such flag to the key specification, the global flags (like
-n
,-r
...) no longer apply to them so it's better to avoid mixing per-key flags and global flags.