SQL Server Collation – What Collation to Use for Ukraine?

collationsql serverunicode

We have an accounting program with an SQL Server database and would like to start a Ukrainian version. I am not sure which collation to use: Ukrainian or Cyrillic_General?

With a non-technical Ukrainian, we compared the collation charts for Cyrillic_general and Ukrainian and found only minor differences towards the end of the table. That person thinks Cyrillic_general should do fine.

Also, SQL Server 2012 no longer lists the Ukrainian collation. Is this a reason for concern?

Any advice which to choose?

Best Answer

First, just to make sure that a collation is available or not, you can test in two ways:

  • Check the list of available collations (i.e. sys.fn_helpcollations()). Running the following query in SQL Server 2012 and 2014 should return 52 rows:

    SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'ukrain%';
    
  • Try using the collation:

    SELECT N'a' COLLATE Ukrainian_100_CI_AS;
    

    If a collation is not available, it will let you know ;-)

Next, we should see if the collations work as expected (best not to fully trust documentation, right ;-) ?).

To test these two collations, run the following:

DECLARE @CollationTest TABLE (Col1 NVARCHAR(5));
INSERT INTO @CollationTest (Col1) VALUES (N'Ы'), (N'Ю'), (N'Ь'), (N'Ъ'), (N'Я'), (N'Э');

SELECT Col1 AS [Ukrainian_100_CS_AS]
FROM @CollationTest
ORDER BY Col1 COLLATE Ukrainian_100_CS_AS;

SELECT Col1 AS [Cyrillic_General_100_CS_AS]
FROM @CollationTest
ORDER BY Col1 COLLATE Cyrillic_General_100_CS_AS;

Results:

Ukrainian_100_CS_AS
----------------------------
Ъ
Ы
Э
Ю
Я
Ь


Cyrillic_General_100_CS_AS
----------------------------
Ъ
Ы
Ь
Э
Ю
Я

Those two orderings at least match the collations linked in the question (which makes sense given that those collation charts were done in 2009 and the _100 series of collations came out in SQL Server 2008).

The only difference seems to be in the initial sort order of one character: Ь (Capital Letter Soft Sign). Is the placement of one character a determining factor? You do need to mirror, as much as possible, the expectations of your customers. But, keep in mind that sort rules and initial ordering changes over the years.

Here are some examples of the Unicode spec for the Ukrainian locale. As you can see, both placements have been correct, though the "current" placement matches the Cyrillic collation, not the Ukrainian collation. The main release history of the locale-specific data starts here: CLDR Releases/Downloads

http://www.unicode.org/repos/cldr/tags/release-1-4/posix/uk_UA.UTF-8.src  (2006)
<CYRILLIC_CAPITAL_LETTER_SOFT_SIGN>                    <X5EF0>;<X0005>;<X008F>;IGNORE -- Ь
<CYRILLIC_CAPITAL_LETTER_YA>                           <X5EFC>;<X0005>;<X008F>;IGNORE -- Я

http://www.unicode.org/repos/cldr/tags/release-1-9/posix/uk_UA.UTF-8.src  (2010)
<CYRILLIC_CAPITAL_LETTER_YA>                           <X5D45>;<X05>;<X8F>;IGNORE -- Я
<CYRILLIC_CAPITAL_LETTER_SOFT_SIGN>                    <X5D46>;<X05>;<X86>;IGNORE -- Ь

http://www.unicode.org/repos/cldr/tags/release-21/posix/uk_UA.UTF-8.src   (2011)
<CYRILLIC_CAPITAL_LETTER_YA>
                           <X5D45>;<X05>;<X8F>;<CYRILLIC_CAPITAL_LETTER_YA>        -- Я
<CYRILLIC_CAPITAL_LETTER_SOFT_SIGN>
                           <X5D46>;<X05>;<X86>;<CYRILLIC_CAPITAL_LETTER_SOFT_SIGN> -- Ь

http://www.unicode.org/repos/cldr/tags/release-24/common/uca/allkeys_CLDR.txt  (2013)
042C  ; [.1B43.0020.0008.042C] # CYRILLIC CAPITAL LETTER SOFT SIGN -- Ь
042F  ; [.1B5E.0020.0008.042F] # CYRILLIC CAPITAL LETTER YA        -- Я

http://www.unicode.org/repos/cldr/tags/release-27/common/uca/allkeys_CLDR.txt  (2015)
042C  ; [.1E56.0020.0008] # CYRILLIC CAPITAL LETTER SOFT SIGN -- Ь
042F  ; [.1E6D.0020.0008] # CYRILLIC CAPITAL LETTER YA        -- Я

And to be fair, we should keep in mind that collations handle more than sorting. They handle casing, transformations, etc.


So what should one do? It is probably best, regardless of the rest of this, to choose the locale that is most common for your customers to be using as their main Windows OS locale. This will at least give consistent behavior to how everything else should be running for them. Run the following to see on your system:

SELECT os.os_language_version AS [WindowsLocale],
       COLLATIONPROPERTY(N'Ukrainian_100_CS_AS', 'lcid') AS [SQLServerLocale-Ukrainian],
    COLLATIONPROPERTY(N'Cyrillic_General_100_CS_AS', 'lcid') AS [SQLLocale-CyrillicGeneral]
FROM   sys.dm_os_windows_info os;

Ask as many of your customers / potential customers as possible to check their Windows setup so that you can plan accordingly.