Unexpected result from sort command of linux bash

bashbash-scriptinglinuxsorting

I have a file foo.txt with this content:

chr1    15
chr11   5
chr11   8
chr1    7
chr2    23
chr1    35

I tried to sort it first according to the first column, and then according to the second column for breaking ties by the following command in linux shell:

sort -k 1,1 -k 2,2n foo.txt

But the result is stange:

chr1    7
chr1    15
chr11   5
chr11   8
chr1    35
chr2    23

What I expected was this:

chr1    7
chr1    15
chr1    35
chr11   5
chr11   8
chr2    23

EDIT
I checked the characters in file with od -fc foo.txt as suggested in comments, there were no strange characters. Here is the result:

0000000   3.5274972e-09   8.7240555e-33   3.5274972e-09    8.716562e-33
          c   h   r   1  \t   1   5  \n   c   h   r   1   1  \t   5  \n
0000020   3.5274972e-09   8.8610065e-33   3.5274972e-09   2.5496164e+21
          c   h   r   1   1  \t   8  \n   c   h   r   1  \t   7  \n   c
0000040   2.1479764e-33   2.5493397e+21   2.1359394e-33     9.37439e-40
          h   r   2  \t   2   3  \n   c   h   r   1  \t   3   5  \n
0000057

I am using sort (GNU coreutils) 8.21

Any ideas?

Best Answer

It appears that your locale's sorting preferences where the issue. You can specify it in your environment, then any command that uses it (including sort) will obey it:

export LC_COLLATE=C
sort -k 1,1 -k 2,2n foo.txt

Or you can specify that value just for the duration of the sort itself

LC_COLLATE=C sort -k 1,1 -k 2,2n foo.txt       # or
env LC_COLLATE=C sort -k 1,1 -k 2,2n foo.txt
Related Question