A key specification like -k2
means to take all the fields from 2 to the end of the line into account. So Villamor 44
ends up before Villamor 50
. Since these two are not equal, the first comparison in sort -k2 -k1
is enough to discriminate these two lines, and the second sort key -k1
is not invoked. If the two Villamors had had the same age, -k1
would have caused them to be sorted by first name.
To sort by a single column, use -k2,2
as the key specification. This means to use the fields from #2 to #2, i.e. only the second field.
sort -k2 -k3 <people.txt
is redundant: it's equivalent to sort -k2 <people.txt
. To sort by last names, then first names, then age, run the following command:
sort -k2,2 -k1,1 <people.txt
or equivalently sort -k2,2 -k1 <people.txt
since there are only these three fields and the separators are the same. In fact, you will get the same effect from sort -k2,2 <people.txt
, because sort
uses the whole line as a last resort when all the keys in a subset of lines are identical.
Also note that the default field separator is the transition between a non-blank and a blank, so the keys will include the leading blanks (in your example, for the first line, the first key will be "Emily"
, but the second key " Bedford"
. Add the -b
option to strip those blanks:
sort -b -k2,2 -k1,1
It can also be done on a per-key basis by adding the b
flag at the end of the key start specification:
sort -k2b,2 -k1,1 <people.txt
But something to bear in mind: as soon as you add one such flag to the key specification, the global flags (like -n
, -r
...) no longer apply to them so it's better to avoid mixing per-key flags and global flags.
To sort you can use a pipe also inside of an awk
command, as in:
awk '{ print ... | "sort ..." }'
The syntax means that all respective lines of the data file will be passed to the same instance of sort.
Of course you can also do that equivalently on shell level:
awk '{ print ... }' | sort ...
Or you can use GNU awk
which has a couple sort functions natively defined.
The uniq
is in awk
typically accomplished by saving the "unique data element or key" in an associative array and checking whether new data need to be memorized. One example to illustrate:
awk '!a[$0]++'
This means: If the current line is not in the array then the condition is true and the default action to print the line triggered. Subsequent lines with the same data will result in a false condition and the data will not be printed.
Best Answer
As the comments indicate, the problem seems likely to be blanks or carriage returns. Either of the following should do the trick:
Some flavors of GNU sed use
-r
instead to get Extended Regular Expressions.tr
is certainly simpler but also more brutal in that it removes the characters whether or not they're trailing.