Command-Line Sort – Wrong Behavior of Sort Command

command linesort

I tried to sort the content of a file in Ubuntu desktop 14.04 (Trusty Tahr). In my case, the expected result should be same as original content, but the actual result is not. Why?

# cat test.txt
a++-a
a++-b
a++-c
ab
ac
# cat test.txt | sort
a++-a
ab
a++-b
ac
a++-c

Best Answer

You could use LC_ALL variable, set it to LC_ALL=C before calling sort

$ LC_ALL=C sort test.txt
a++-a
a++-b
a++-c
ab
ac

Read this answer, if you want to know what is this magically LC_ALL=C. Here is short summary:

The C locale is a special locale that is meant to be the simplest locale. You could also say that while the other locales are for humans, the C locale is for computers. In the C locale, characters are single bytes, the charset is ASCII, the sorting order is based on the byte values.

Also, as @KenMollerup pointed, quote from man sort

   ***  WARNING  ***  The locale specified by the environment affects sort
   order.  Set LC_ALL=C to get the traditional sort order that uses native
   byte values.

So when using sort with LC_ALL=C, sort compare symbols bytewise. Otherwise sort will ignore all non alphanumerical characters.

Related Question