“sort -g” does not work as expected on data in scientific notation

localenumeric datasort

I am trying to sort a data file in descending order. The data file is given by three columns delimited by tabs; I want to order them in descending order for the third column with (the third column is given as a scientific notation in exponential value):

cat eII_surf.txt | sort -gr -k3

Somehow, this worked on a previous machine, but my new one does not seem to do the trick at all.

Here a simple example:

cat test.txt:

6.7 2.3e-12
5.0 3.4e-18
4.5 5.6e-16
4.2 2.1e-15
4.0 2.9e-17
2.4 2.5e-15
1.0 1.0e-17
0.5 1.0e-18

and cat test.txt | sort -gr -k2:

4.5 5.6e-16
5.0 3.4e-18
6.7 2.3e-12
4.2 2.1e-15
4.0 2.9e-17
2.4 2.5e-15
1.0 1.0e-17
0.5 1.0e-18

This is the output of locale:

LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC=de_DE.utf8
LC_TIME=de_DE.utf8
LC_COLLATE="en_US.utf8"
LC_MONETARY=de_DE.utf8
LC_MESSAGES="en_US.utf8"
LC_PAPER=de_DE.utf8
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT=de_DE.utf8
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

Best Answer

2.3e-12 would be understood as 2 in a locale where the decimal radix character is , (as it is in most of the non-English speaking world including your de_DE.utf8) where the number would need to be written 2,3e-12.

You could do:

LC_ALL=C sort -grk2 < your-file

To force numbers being interpreted in the English style.

In the C locale (the only one you would be guaranteed to find on any system), the decimal radix is . (conveniently for your input).

Note that sort has nothing to do with bash, it's a separate command. The -g option is a non-standard extension of the GNU implementation of sort.

Related Question