I am sorting a file prior to joining it with another file, using
sort -k1 file1 > file1_sort
When I try to join with the second file, I get an error saying file1 is not sorted. I think this is occurring because of the following entry:
chr6_32609371_I I2 D
chr6_32609371 T C
The "chr6_32609371" line needs to be placed before the "chr6_32609371_I" in my sorted file. Is there an argument I can add to the sort command to get this to happen?
Best Answer
The problem is that
sort -k1
will not sort according to the first field but from the first field to the end of the line. Fromman sort
(emphasis mine):So,
-k1
is comparingchr6_32609371_I I2 D
tochr6_32609371 T C
and sinceI
is beforeT
, it is sorting as you see. To get around this, you should tellsort
to only take into account the 1st field by passing both a start and an end position: