Is there LC_COLLATE that sorts dot before dash

localesortwildcardszsh

In the locales that I've checked (C, en_US.UTF-8) dot (".") is sorted after dash ("-"). When I cd and complete directory say "som", then some-dir.git is completed before some.git. I also list themes for my project and file zdharma-256.theme is globbed before zdharma.theme. A natural order for me is that the shorter directory is completed first.

Is there a LC_COLLATE that I could use to fix this?

Maybe it's not a collate problem, but problem of ignoring extension in first pass of sorting? Is there Zsh code (globbing flags, etc.) that I could use?

Best Answer

No, there is no such collate, at least not a standard one.

Here is how you can check it yourself:

  1. first prepare a file (lines Aa and aa are here just for the test purpose)

    cat >test <<\eof
    Aa
    aa
    some.git
    some-dir.git
    eof
    
  2. run the sort command with all possible collation available on the system:

    for loc in $(locale -a); do
        echo "____${loc}____";
        LC_COLLATE="$loc" sort test;
    done > test_sorted
    
  3. now open test_sorted with your favorite editor and see that different locales sorted Aa and aa differently, but all of them have some-dir.git before some.git. In other words

    pcregrep -M 'some.git\nsome' test_sorted
    

    gives nothing.

The reason why - comes before . originates from ascii and unicode table (see man ascii). The hyphen character (technically it is called hyphen-minus sign) has the 45 decimal code (U+002D), while a dot is 46 (U+002E).

If you are desperate enough you can write your own locale, changing that behavior. The easiest is to modify one of the current files, which you can find in /usr/share/i18n/locales/.