Linux – Why does the `ls` command sort files like this

command linecoreutilslinuxlssorting

As I was trying to reverse engineer the ls command, I came upon an interesting behavior. When I make 3 files, foo.png, foopa.png, and fooqa.png, ls sorts them as foopa.png, foo.png, and fooqa.png. I also tried it using the .gif extension and it seems to be that it happens when p and q are replaced by the first letter of the extension and the next letter in the alphabet; so in the case of .gif it would be g and h. (fooga.gif, then foo.gif, then fooha.gif)

Why does it order the output this way?

Best Answer

It depends on the collation order of your locale:

>LANG=en_IE.UTF-8 ls -1 foo*
foopa.png
foo.png
fooqa.png

>LANG=C ls -1 foo* 
foo.png
foopa.png
fooqa.png

You can also use the LC_COLLATE variable instead of LANG, and use the POSIX locale instead of the C one.

C collation order is purely alphabetical (ASCII order). Other collation orders (such as English) may consider spaces and special characters such as dots as separators and either handle "words" separately or just ignore these separators (which appears to be the case here).

Note that the non-UTF-8 locale sorts using alphabetic ASCII, too:

>LANG=en_IE ls -1 foo*
foo.png
foopa.png
fooqa.png

After some more digging, it appears that ignoring punctuation is a common feature of Unicode-aware locales such as the *.UTF-8 ones.

Related Question