Why non-number record shows after “0” in sorting

sort

I want to sort files according to the number in the filename.
Here are the files:

$ ls *.f
0.f  13.f  1.f  22.f  4.f  abc.f

The sorting result:

$ ls *.f | sort -t. -k1n
0.f
abc.f # note this file!
1.f
4.f
13.f
22.f

What I had expected was:

$ ls *.f | sort -t. -k1n
abc.f
0.f
1.f
4.f
13.f
22.f

Why was abc.f showed just after 0.f and before 1.f? Is it because 0 is not treated as a number by sort? I searched the web and didn't find any reference.

Best Answer

The reason is because when using numeric sort, strings without numbers are treated as zero. GNU sort gets the behavior right, but makes no comment as to why. The man page on illumos for SunOS sort does provide an explanation:

-n
Restricts the sort key to an initial numeric string, consisting of optional blank characters, optional minus sign, and zero or more digits with an optional radix character and thousands separators (as defined in the current locale), which is sorted by arithmetic value. An empty digit string is treated as zero. Leading zeros and signs on zeros do not affect ordering.

This behavior is also specified in SUSv4 and POSIX.1-2008 (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html), using the same verbiage as the illumos man page.

GNU sort also has -g, "general numeric sort", which sorts by floating point numbers instead of integers where empty digit strings are sorted before zero. I'm not sure if this is a side effect or intentional. However, -g comes with a warning since it is significantly slower than -n. If you'e sorting a large dataset or doing anything that users are waiting on you should avoid -g.

Related Question