Sort – Why Does ‘sort’ Produce Output in Weird Order?

bashcoreutilssort

Consider the following input to sort:

cat > foo <<EOM
D,,5014978
DD,,25
D,I,1972765530
D,Y,4223624
-,Y,71285059
YA,I,2
EOM

Now try running sort foo

The output is not sorted when trying this on any of my linux boxes (gnu coreutils versions 6.9-7.4). The output is sorted when run under cygwin (gnu coretuils 8.5). Comments?

Best Answer

Sorting depends on the locale; specifically, it depends on $LC_COLLATE (possibly overridden by $LC_ALL), falling back to $LANG if it doesn't exist. The command locale will show you what values you're effectively working with. See man 3 strcoll, man 3 setlocale, etc.

LC_COLLATE=C (or POSIX or no locale at all) results in a strict byte-by-byte comparison.

LC_COLLATE=en_US.utf8 results in an alphabetical-equivalence sort, with punctuation ignored and characters within the same equivalence class treated equally.

Related Question