By default, the newer sort order considers strings in file and folder names as numeric content, not text. Numerals in folder and file names are sorted according to their numeric value.
In the following example, note how the following files, whose names contain numerals, are sorted.
Windows Vista, Windows XP, and Windows Server 2003
5.txt
11.txt
88.txt
In this example, 88
is a numerically higher value than 5
. Therefore, the 88.txt
is listed after the 5.txt when you sort the folders by name in ascending order.
Source: The sort order for files and folders whose names contain numerals is different in Windows Vista, Windows XP, and Windows Server 2003 than it is in Windows 2000
The other answer and comment answer the question in general, here's how an implementation can look like:
$ cat order
Bahamas,3
Canada,2
United States,1
$ cat data
C,United States,WA,Tacoma,f,1
A,United States,MA,Boston,f,0
B,United States,NY,New York,f,5
A,Canada,QC,Montreal,f,2
A,Bahamas,Bahamas,Nassau,f,2
A,United States,NY,New York,f,1
$ sort -t, -k2 data | join -t, -11 -22 order - | sort -t, -k2n -k4,5 -k6r -k7nr | cut -d, -f 3,1,4-7
A,United States,MA,Boston,f,0
B,United States,NY,New York,f,5
A,United States,NY,New York,f,1
C,United States,WA,Tacoma,f,1
A,Canada,QC,Montreal,f,2
A,Bahamas,Bahamas,Nassau,f,2
Best Answer
I did some testing and the overall ordering seems to be as follows...
Symbols
Latin (ordered by Unicode value (U+xxxx))
Greek (ordered by Unicode value (U+xxxx))
Cyrillic (ordered by Unicode value (U+xxxx))
Hebrew (ordered by Unicode value (U+xxxx))
Arabic (ordered by Unicode value (U+xxxx))
Numbers
Latin (ordered by Unicode value (U+xxxx))
Greek (ordered by Unicode value (U+xxxx))
Cyrillic (ordered by Unicode value (U+xxxx))
Hebrew (ordered by Unicode value (U+xxxx))
Arabic (ordered by Unicode value (U+xxxx))
Letters
Latin (ordered by Unicode value (U+xxxx))
Greek (ordered by Unicode value (U+xxxx))
Cyrillic (ordered by Unicode value (U+xxxx))
Hebrew (ordered by Unicode value (U+xxxx))
Arabic (ordered by Unicode value (U+xxxx))
Sorting Rule Sequence vs Observed Order
It's worth noting that there are really two ways of looking at this. Ultimately, what you have are sorting rules that are applied in a certain order, in turn, this produces an observed order. The ordering of older rules becomes nested under the ordering of newer rules. This means that the first rule applied is the last rule observed, while the last rule applied is the first or topmost rule observed.
Sorting Rule Sequence
1.) Sort on Unicode Value (U+xxxx)
2.) Sort on culture/language
3.) Sort on Type (Symbol, Number, Letter)
Observed Order
The highest level of grouping is by type in the following order...
1.) Symbols
2.) Numbers
3.) Letters
Therefore, any symbol from any language comes before any number from any language, while any letter from any language appears after all symbols and numbers.
The second level of grouping is by culture/language. The following order seems to apply for this:
Latin
Greek
Cyrillic
Hebrew
Arabic
The lowest rule observed is Unicode order, so items within a type-language group are ordered by Unicode value (U+xxxx).