I have a table with a unique key that includes an NVARCHAR(50)
column (correct or not, but is there). So, when trying to insert Șc
or C
(doesn't matter the order of the insert) it breaks on the 2nd insert due to collation issues. Here is the error:
(1 row(s) affected) Msg 2601, Level 14, State 1, Line 16
Cannot insert duplicate key row in object 'dbo.testT' with unique index 'IX_TestT'. The duplicate key value is (C).
Select returns:
Database default collation is Latin1_General_CI_AS
. Spent some time looking on how to solve it, without changing too much the already existing structure, but can't find a way to getting working. Tried different collations and combinations, everything fails. Read (here and here) about character expansions and so on, still stuck. Here is a sample code that I'm using to replicate the issue, feel free to modify and recommend anything that could help solve this.
CREATE TABLE testT (
[Default_Collation] [NVARCHAR] (50) COLLATE DATABASE_DEFAULT,
[Latin1_General_CI_AS] [NVARCHAR] (50) COLLATE Latin1_General_CI_AS,
[Latin1_General_CI_AI] [NVARCHAR] (50) COLLATE Latin1_General_CI_AI,
[SQL_Collation] [NVARCHAR] (50) COLLATE SQL_Latin1_General_CP1_CI_AS);
CREATE UNIQUE CLUSTERED INDEX [IX_TestT] ON [dbo].[testT] ([Default_Collation])
ON [PRIMARY]
GO
INSERT INTO testT
SELECT N'Șc', --COLLATE Latin1_General_CI_AS
N'Șc', --COLLATE Latin1_General_CI_AS
N'Șc', --COLLATE Latin1_General_CI_AS
N'Șc' --COLLATE Latin1_General_CI_AS
INSERT INTO testT
SELECT N'C' --COLLATE Latin1_General_CI_AS
,N'C' --COLLATE Latin1_General_CI_AS
,N'C' --COLLATE Latin1_General_CI_AS
,N'C' --COLLATE SQL_Latin1_General_CP1_CI_AS
SELECT * FROM testT;
DROP TABLE testT;
Best Answer
The problem is that the old SQL Server Collations (i.e. those with names starting with
SQL_
) and the first two versions of the Windows Collations (the80
series which came with SQL Server 2000 and do not have a version number in the name, and the90
series which came with SQL Server 2005) are missing the sort weights for a great many characters. This was mostly corrected starting with the100
series Collations that came with SQL Server 2008.As you can see in the below examples, the
Ș
character matches an empty string when using non-binary, version 80 or 90 Collations (and SQL Server Collations) since they both have the same sort weight: 0. Nothing. Nada. This means that when comparingN'Șc'
withN'C'
(using pre-series 100 Collations), you are really comparingN'c'
withN'C'
(test #1):So, unfortunately you will need to drop the PK, alter the column to have a 100-level Collation (e.g.
Latin1_General_100_CI_AS_SC
), and then recreate the PK. Please note that the difference in that suggested Collation from the current Collation is both the100
and the_SC
at the end, which allows it to properly handle supplementary characters.This does not mean that you need to:
NVARCHAR
in the PK)table.column = N'Ș'
and@variable = N'Ș'
since variables and string literals use the Database's default Collation.For more examples of this behavior, please see the "Supplementary Characters" section of the following blog post of mine:
The Uni-Code: The Search for the True List of Valid Characters for T-SQL Identifiers, Part 3 of 2 (Delimited Identifiers)