LS – Why Does ls Sorting Ignore Non-Alphanumeric Characters?

lssort

When sorting file names, ls ignores characters like -,_. I expected it to use those characters in sorting as well.

An example:

touch a1 a2 a-1 a-2 a_1 a_2 a.1 a.2 a,1 a,2

Now display these files with ls -1:

a1
a_1
a-1
a,1
a.1
a2
a_2
a-2
a,2
a.2

What I expected was something like this:

a1
a2
a,1
a,2
a.1
a.2
a_1
a_2
a-1
a-2

i.e. I expected the non-alphanumeric characters to be taken into account when sorting.

Can anyone explain this behaviour? Is this behaviour mandated by a standard? Or is this due the encoding being UTF-8?

Update: It seems that this is related to UTF-8 sorting:

$ LC_COLLATE=C ls -1
a,1
a,2
a-1
a-2
a.1
a.2
a1
a2
a_1
a_2

Best Answer

This has nothing to do with the charset. Rather, it's the language that determines the collation order. The libc examines the language presented in $LC_COLLATE/$LC_ALL/$LANG and looks up its collation rules (e.g. /usr/share/i18n/locales/* for GLibC) and orders the text as directed.

Related Question