Sort Command – Strange Default Sort Behavior


I have some trouble to understand what is happening here:

[guido@localhost 9]$ ls -1 Star\ Wars\ Episode\ *
Star Wars Episode II Attack of the Clones.avi
Star Wars Episode III Revenge of the Sith.avi
Star Wars Episode I The Phantom Menace.avi
Star Wars Episode IV A New Hope.avi
Star Wars Episode VI Return of the Jedi.avi
Star Wars Episode V The Empire Strikes Back.avi

III\b before I\b, but II\b before III\b? Whatever is causing this, it is not behaving consistently. This is also the same sorting result I get in the GUI. I checked all the blanks are actually blanks, and there are no case difference among the filenames. How can it be? Is it skipping the roman numeral, and trying to sort after it?

Other tests:

[guido@localhost 9]$ find -name "Star Wars *" -print
./Star Wars Episode I The Phantom Menace.avi
./Star Wars Episode II Attack of the Clones.avi
./Star Wars Episode III Revenge of the Sith.avi
./Star Wars Episode IV A New Hope.avi
./Star Wars Episode V The Empire Strikes Back.avi
./Star Wars Episode VI Return of the Jedi.avi


[guido@localhost 9]$ find -name "Star Wars *" -print | sort
./Star Wars Episode II Attack of the Clones.avi
./Star Wars Episode III Revenge of the Sith.avi
./Star Wars Episode I The Phantom Menace.avi
./Star Wars Episode IV A New Hope.avi
./Star Wars Episode VI Return of the Jedi.avi
./Star Wars Episode V The Empire Strikes Back.avi


[guido@localhost 9]$ find -name "Star Wars *" -print | sort -f
./Star Wars Episode I The Phantom Menace.avi
./Star Wars Episode II Attack of the Clones.avi
./Star Wars Episode III Revenge of the Sith.avi
./Star Wars Episode IV A New Hope.avi
./Star Wars Episode V The Empire Strikes Back.avi
./Star Wars Episode VI Return of the Jedi.avi

I don't think my locale may be affecting this (I also set it to en_US.utf8 anyway) as the doc suggests. What am I missing?

[guido@localhost 9]$ sort --version
sort (GNU coreutils) 8.22

Best Answer

Spaces (and probably case) are ignored when sorting with your locale. Thus you have this ordering after the common prefix "Star Wars Episode":

  • IIA
  • III
  • ITH

The find returns results in directory order, which just happens to be the "expected" order.

You can return to "traditional" sorting per this from the man page:

Set LC_ALL=C to get the traditional sort order that uses native byte values.

Related Question