How to find the sort order for ls with the locale

localesort

I was using ls -l on a directory, and was surprised that spaces and underscores were ignored for the sort order. For example,

$ echo $LANG
en_AU.UTF-8
$ ls -l
total 0
-rw-r--r-- 1 sparhawk sparhawk 0 Nov 20 21:12 a_a
-rw-r--r-- 1 sparhawk sparhawk 0 Nov 20 21:13 a b
-rw-r--r-- 1 sparhawk sparhawk 0 Nov 20 21:13 a_c
-rw-r--r-- 1 sparhawk sparhawk 0 Nov 20 21:13 a d
$ LANG=en_AU ls -l
total 0
-rw-r--r-- 1 sparhawk sparhawk 0 Nov 20 21:13 a b
-rw-r--r-- 1 sparhawk sparhawk 0 Nov 20 21:13 a d
-rw-r--r-- 1 sparhawk sparhawk 0 Nov 20 21:12 a_a
-rw-r--r-- 1 sparhawk sparhawk 0 Nov 20 21:13 a_c

In my default locale, spaces and underscores are interchangeable, and without UTF-8, spaces come before underscores. I see similar results for en_US and en_US.UTF-8.

I have two questions:

  1. Am I interpreting this correctly? Are they interchangeable?
  2. Is there a list of my locale's sort order? I want to find a character that precedes underscore.

Best Answer

This wouldn't be a full answer but some articles and thoughts.

Here you can find some notes on Sorting order http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

Here is the standard on how locales are defined http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03

And this is a Unicode standard and collation rules (for sorting) http://www.unicode.org/reports/tr10/ . I don't claim that collation in UTF-8 locale is implemented that way, although I have a strong belief that it is.

Related Question