Sort data in descending order of first column, for equal values, use second column in ascending order

sorttext processing

Allow me to clarify:

Assume I have some keywords with frequency of their usage:

12 Hi
7  Hash
7  C++  
9  Superuser
17 Stackoverflow
9  LaTeX  
42 Life
9  Ubuntu

What I want, is to sort this data based on frequency in descending order and if there are some equal values, it should use the second column in ascending order.

sort -n -r foo.txt

Does the first part but then second column are also reversed:

42 Life
17 Stackoverflow
12 Hi
9  Ubuntu
9  Superuser
9  LaTeX  
7  Hash
7  C++

How can I achieve the following results?

42 Life
17 Stackoverflow
12 Hi
9  LaTeX  
9  Superuser
9  Ubuntu
7  C++ 
7  Hash

I think I have to use -k argument but I can't figure out how!

I want to know how this can be done using solely sort command of bash. However if it's not possible to achieve this only by sort, other commands should be Bourne shell compatible.

Best Answer

Specify the sort keys separately with the criteria:

sort -k1,1nr -k2,2 inputfile

This specifies that the first key is sorted numerically in reverse order while the second is sorted as per the default sort order.

Quoting from POSIX sort:

-k keydef

The keydef argument is a restricted sort key field definition. The format of this definition is:

field_start[type][,field_end[type]]

where field_start and field_end define a key field restricted to a portion of the line (see the EXTENDED DESCRIPTION section), and type is a modifier from the list of characters 'b', 'd', 'f', 'i', 'n', 'r'. The 'b' modifier shall behave like the -b option, but shall apply only to the field_start or field_end to which it is attached. The other modifiers shall behave like the corresponding options, but shall apply only to the key field to which they are attached; they shall have this effect if specified with field_start, field_end, or both. If any modifier is attached to a field_start or to a field_end, no option shall apply to either. Implementations shall support at least nine occurrences of the -k option, which shall be significant in command line order. If no -k option is specified, a default sort key of the entire line shall be used.

When there are multiple key fields, later keys shall be compared only after all earlier keys compare equal. Except when the -u option is specified, lines that otherwise compare equal shall be ordered as if none of the options -d, -f, -i, -n, or -k were present (but with -r still in effect, if it was specified) and with all bytes in the lines significant to the comparison. The order in which lines that still compare equal are written is unspecified.

This would produce:

42 Life
17 Stackoverflow
12 Hi
9  LaTeX
9  Superuser
9  Ubuntu
7  C++
7  Hash
Related Question