Sort – How to Sort on Two Fields, Second Then First

sort

I am trying to sort on multiple columns. The results are not as expected.

Here's my data (people.txt):

Simon Strange 62
Pete Brown 37
Mark Brown 46
Stefan Heinz 52
Tony Bedford 50
John Strange 51
Fred Bloggs 22
James Bedford 21
Emily Bedford 18
Ana Villamor 44
Alice Villamor 50
Francis Chepstow 56

The following works correctly:

bash-3.2$ sort -k2 -k3 <people.txt                                                                                                                    
Emily Bedford 18                                                                                                                                      
James Bedford 21                                                                                                                                      
Tony Bedford 50                                                                                                                                       
Fred Bloggs 22                                                                                                                                        
Pete Brown 37                                                                                                                                         
Mark Brown 46                                                                                                                                         
Francis Chepstow 56                                                                                                                                   
Stefan Heinz 52                                                                                                                                       
John Strange 51                                                                                                                                       
Simon Strange 62                                                                                                                                      
Ana Villamor 44                                                                                                                                       
Alice Villamor 50

But, the following does not work as expected:

bash-3.2$ sort -k2 -k1 <people.txt                                        
Emily Bedford 18                                                                                                                                      
James Bedford 21                                                                                                                                      
Tony Bedford 50                                                                                                                                       
Fred Bloggs 22                                                                                                                                        
Pete Brown 37                                                                                                                                         
Mark Brown 46                                                                                                                                         
Francis Chepstow 56                                                                                                                                   
Stefan Heinz 52                                                                                                                                       
John Strange 51                                                                                                                                       
Simon Strange 62                                                                                                                                      
Ana Villamor 44                                                                                                                                       
Alice Villamor 50

I was trying to sort by surname and then by first name, but you will see the Villamors are not in the correct order. I was hoping to sort by surname, and then when surnames matched, to sort by first name.

It seems there is something about how this should work I don't understand. I could do this another way of course (using awk), but I want to understand sort.

I am using the standard Bash shell on Mac OS X.

Best Answer

A key specification like -k2 means to take all the fields from 2 to the end of the line into account. So Villamor 44 ends up before Villamor 50. Since these two are not equal, the first comparison in sort -k2 -k1 is enough to discriminate these two lines, and the second sort key -k1 is not invoked. If the two Villamors had had the same age, -k1 would have caused them to be sorted by first name.

To sort by a single column, use -k2,2 as the key specification. This means to use the fields from #2 to #2, i.e. only the second field.

sort -k2 -k3 <people.txt is redundant: it's equivalent to sort -k2 <people.txt. To sort by last names, then first names, then age, run the following command:

sort -k2,2 -k1,1 <people.txt

or equivalently sort -k2,2 -k1 <people.txt since there are only these three fields and the separators are the same. In fact, you will get the same effect from sort -k2,2 <people.txt, because sort uses the whole line as a last resort when all the keys in a subset of lines are identical.

Also note that the default field separator is the transition between a non-blank and a blank, so the keys will include the leading blanks (in your example, for the first line, the first key will be "Emily", but the second key " Bedford". Add the -b option to strip those blanks:

sort -b -k2,2 -k1,1

It can also be done on a per-key basis by adding the b flag at the end of the key start specification:

sort -k2b,2 -k1,1 <people.txt

But something to bear in mind: as soon as you add one such flag to the key specification, the global flags (like -n, -r...) no longer apply to them so it's better to avoid mixing per-key flags and global flags.

Related Question