I am trying to sort some simple pipe-delimited data. However, sort isn't actually sorting. It moves my header row to the bottom, but my two rows starting with 241 are being split by a row starting with 24.
cat sort_fail.csv
column_a|column_b|column_c
241|212|20810378
24|121|2810172
241|213|20810376
sort sort_fail.csv
241|212|20810378
24|121|2810172
241|213|20810376
column_a|column_b|column_c
The column headers are being moved to the bottom of the file, so sort is clearly processing it. But, the actual values aren't being sorted like I'd expect.
In this case I worked around it with
sort sort_fail.csv --field-separator='|' -k1,1
But, I feel like that shouldn't be necessary. Why is sort not sorting?
Best Answer
sort
is locale aware, so depending on your LC_COLLATE setting (which is inherited from LANG) you may get different results:This can cause problems in scripts, because you may not be aware of what the calling locale is set to, and so may get different results.
It's not uncommon for scripts to force the setting needed
e.g.
Now what's interesting, here, is the
|
character looks odd.But that's because the default rule for en_US, which derives from ISO, says
Which means the
|
character is ignored and the sort order would be as if the character doesn't exist..And that matches the "unexpected" sorting you are seeing.
The work arounds are to use
-n
(to force numeric sorts), or to use the field separator (as you did) or to use theC
locale.