Sql-server – SQL Server: Cannot resolve collation conflict between “Latin1_General_CI_AS” and “[garbage]”

collationfunctionssql serversql-server-2012

So I'm writing a large UDF in T-SQL to be used in reporting. The UDF contains quite a few common table expressions.

At one point I was adding another CTE:

 cteCmtCauses AS (
    SELECT ProductId = p.Id,
           Name = hz.Name,
           CMT = CONCAT(IIF(hz.Cmt_c= '1', 'C', ''), IIF(hz.hz.Cmt_m = '1', 'M', ''), IIF(hz.Cmt_t = '1', 'R', ''))
    FROM [redacted]

    UNION ALL

    SELECT ProductId = p.Id,
           Name = c.Name,
           --C = c.Cmr_HasCarcinogenicRisk,
           --M = c.Cmr_HasMutagenicRisk,
           --R = c.Cmr_HasToxicForReproductionRisk,
           CMT = CONCAT(IIF(hz.Cmt_c= '1', 'C', ''), IIF(hz.hz.Cmt_m = '1', 'M', ''), IIF(hz.Cmt_t = '1', 'R', ''))
    FROM [redacted]
 ),

 cteCmtCausesConcat AS (
    SELECT ProductId = p.Id,
           ComponentIds = (
                -- Here the issue happens
                SELECT CONCAT(cte.CMT, N'|', cte.Name, dbo.QueryConcatenationString())
                FROM cteCmtCauses cte 
                WHERE cte.ProductId = p.Id
                FOR XML PATH(N''), TYPE
           )
    FROM [redacted]
 ),

When trying to persist the UDF mutation, I got this error:

Msg 468, Level 16, State 9, Procedure QueryProduct, Line 93
Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "널㾍.鉀杫.....祉߾.䊙꛸.鈀杫..." in the concat operation.

And in fact on every attempt the message changed a little bit:

Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "࿠䚋.剀焩.....祉߾.䊙꛸.刀焩..." in the concat operation.
Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "꿠䥆.뉀洶.....祉߾.䊙꛸.눀洶..." in the concat operation.
Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "꿠洦.퉀洷.....祉߾.䊙꛸.툀洷..." in the concat operation.
Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "焐柘.牀䯏.....祉߾.䊙꛸.爀䯏..." in the concat operation.

I was able to work around it by using:

SELECT CONCAT(cte.CMT, N'|', cte.Name COLLATE Latin1_General_CI_AS, dbo.QueryConcatenationString())

But the weird thing is, everything in the database and tempdb has the same collation: database collation, being Latin1_General_CI_AS (except from one of the tables used in the UNION ALL which is Latin1_General_CS_AS).

Gist of full reproducible sample, make sure that the database collation is Latin1_General_CI_AS.

How do I properly resolve this issue, is this a known bug, and do I need to worry about SQL server silently corrupting my data once I start using this UDF?

Using

Microsoft SQL Server 2012 (SP3-GDR) (KB3194721) - 11.0.6248.0 (X64) 
    Sep 23 2016 15:49:43 
    Copyright (c) Microsoft Corporation
    Business Intelligence Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)

Best Answer

UPDATE

I finally had time to test this and was able to reproduce the problem. While not respecting collation precedence (noted in my original answer below) was a problem, it was not this problem (though both could be caused by the same underlying bug). Here is what I can now confirm:

  1. The O.P.'s test script (via Gist) does not produce this problem as it's missing a critical piece: forcing the collation of one of the NVARCHAR "Name" columns in the Component and Statement tables to be different than the other.
  2. The actual error is a collation mismatch between the two "Name" columns in the UNION ALL.
  3. The collation mismatch in the UNION ALL should have ended the query before it even got to the CONCAT function, and it would have if both columns were VARCHAR, but if at least one column is NVARCHAR, then the CONCAT function prevents the query from terminating properly.
  4. There are two ways in which the query can behave improperly when CONCAT is involved (at least in this UNION ALL scenario):
    1. If the column from the subquery (containing the UNION ALL with the collation mismatch) is not the first parameter of the CONCAT function, then it will terminate with a misleading error message, stating that the error is in the "concat operation" instead of being in the "UNION ALL operation" (oh, and let's not forget about the garbage collation name in the error message!)
    2. If the column from the subquery (containing the UNION ALL with the collation mismatch) is the first parameter of the CONCAT function, then it will actually succeed, using the database's default collation for the value returned by the CONCAT function. (see final test case below)
  5. The CONCAT built-in function was fixed as of SQL Server 2014 as neither this behavior, nor the incorrect behavior shown in the "Original Answer" (see below) is reproducible starting with that version (and I tested with 2014, 2016, 2017, and 2019).
-- DROP TABLE #Mix;
CREATE TABLE #Mix
(
  [VC1]  VARCHAR(50)  COLLATE SQL_Latin1_General_CP437_CS_AS,
  [VC2]  VARCHAR(50)  COLLATE Azeri_Cyrillic_100_CS_AS_WS,
  [NVC1] NVARCHAR(50) COLLATE Frisian_100_CS_AI_KS,
  [NVC2] NVARCHAR(50) COLLATE Sami_Sweden_Finland_100_CI_AI
);
INSERT INTO #Mix ([VC1], [VC2], [NVC1], [NVC2]) VALUES (0xB0, 0xDE, 0xDE, 0xDE);
SELECT * FROM #Mix;
/*
VC1    VC2    NVC1    NVC2

░      Ю      Þ       Þ
*/



SELECT CONCAT(N'Both VARCHAR', sub.[WhatEva])
FROM   (
    SELECT [VC1]
    FROM #Mix
    UNION ALL
    SELECT [VC2]
    FROM #Mix
) sub([WhatEva]);
/*
 Msg 457, Level 16, State 1, Line XXXXX
Implicit conversion of varchar value to varchar cannot be performed because the
   collation of the value is unresolved due to a collation conflict between
   "Azeri_Cyrillic_100_CS_AS_WS" and "SQL_Latin1_General_CP437_CS_AS" in
   UNION ALL operator.
*/


SELECT CONCAT(N'At least one NVARCHAR', sub.[WhatEva])
FROM   (
    SELECT [VC1]
    FROM #Mix
    UNION ALL
    SELECT [NVC1]
    FROM #Mix
) sub([WhatEva]);
/*
-- 2012
Msg 468, Level 16, State 9, Line XXXXX
Cannot resolve the collation conflict between "{db_default_collation}" and
   "堓.ꚤ鍛翹.堓.툀堗.툀堗.쓀姧.꺱䱷..꺱䱷......꺱䱷..툘堗.帎鍲翹.堓..錿翹..."
   in the concat operation.

-- 2014, 2016, 2017, 2019
Msg 451, Level 16, State 1, Line XXXXX
Cannot resolve collation conflict between "Frisian_100_CS_AI_KS" and
   "SQL_Latin1_General_CP437_CS_AS" in UNION ALL operator occurring in SELECT
   statement column 1.
*/


-- SUCCESS!?!?!?! Should be an error!!!
SELECT CONCAT(sub.[WhatEva], N':At least one NVARCHAR') AS [ConcatSuccessWTF?],
       SQL_VARIANT_PROPERTY(CONCAT(sub.[WhatEva], N':At least one NVARCHAR'),
                            'collation') AS [ResultingCollation]
FROM   (
    SELECT [VC1]
    FROM #Mix
    UNION ALL
    SELECT [NVC1]
    FROM #Mix
) sub([WhatEva]);
/*
-- 2012
ConcatSuccessWTF?          ResultingCollation

░:At least one NVARCHAR    {db_default_collation}
Þ:At least one NVARCHAR    {db_default_collation}


-- 2014, 2016, 2017, 2019
Msg 456, Level 16, State 1, Line XXXXX
Implicit conversion of nvarchar value to sql_variant cannot be performed because the
   resulting collation is unresolved due to collation conflict between
   "Frisian_100_CS_AI_KS" and "SQL_Latin1_General_CP437_CS_AS" in UNION ALL operator.
*/

 

is this a known bug

Not sure if it was noted in any public forum, but it must have been noticed internally (to Microsoft) since it was fixed as of the next version (i.e. SQL Server 2014), though not in any Service Pack as I tested on SQL Server 2012, SP4 GDR (11.0.7462.6).

How do I properly resolve this issue

If you, or anyone else, is still using SQL Server 2012 and running into this, it is best to resolve the collation conflict at the source, which is in the UNION ALL operation. Just pick the table with the column that has the collation that you do not want, and apply the COLLATE clause there such that the UNION ALL operation succeeds on its own, even if no CONCAT function is being used. This is better than specifying the COLLATE in the CONCAT function because that is after the fact of the UNION ALL, which in this case should be failing, but is allowed to succeed due to the bug in CONCAT.


ORIGINAL ANSWER

The Collation problem (separate from the error message resulting from the Collation problem) is due to the CONCAT built-in function not honoring Collation Precedence, and hence requiring all input parameters to be of the same Collation. Clearly you have one input parameter that is not of the same Collation as the rest. That parameter is cte.Name, which you have currently fixed via the COLLATE keyword.

We can simulate this scenario as follows. You can run it from any database. The default Collation for the Database that I am executing the following code in is: SQL_Latin1_General_CP1_CI_AS.

CREATE TABLE #TT (Col1 NVARCHAR(50) COLLATE SQL_EBCDIC278_CP1_CS_AS);
INSERT INTO #TT values ('something');


SELECT CONCAT('now this is ', tmp.Col1)
FROM #TT tmp;
/*
Msg 468, Level 16, State 9, Line 17
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and
    "SQL_EBCDIC278_CP1_CS_AS" in the concat operation.
*/

And the following two queries even show that the Collation is evaluated per each input parameter, with the first input parameter setting the Collation to be used:

SELECT CONCAT('now this is ', tmp.Col1, N' else' COLLATE Latin1_General_100_CI_AS)
FROM #TT tmp;
/*
Msg 468, Level 16, State 9, Line 23
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and
    "SQL_EBCDIC278_CP1_CS_AS" in the concat operation.
Msg 468, Level 16, State 9, Line 23
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and
    "Latin1_General_100_CI_AS" in the concat operation.
*/

SELECT CONCAT('now this is ' COLLATE Latin1_General_100_CI_AS, tmp.Col1, N' else')
FROM #TT tmp;
/*
Msg 468, Level 16, State 9, Line 30
Cannot resolve the collation conflict between "Latin1_General_100_CI_AS" and
    "SQL_EBCDIC278_CP1_CS_AS" in the concat operation.
Msg 468, Level 16, State 9, Line 30
Cannot resolve the collation conflict between "Latin1_General_100_CI_AS" and
    "SQL_Latin1_General_CP1_CI_AS" in the concat operation.
*/

This conflict can be fixed in two ways:

  1. Don't use the CONCAT built-in function. The main benefit of CONCAT is that it doesn't require each parameter to be a string type; it will handle the conversion to string internally. This is convenient if you have several non-string items to concatenate. But if you only have strings, then it doesn't provide any benefit, and probably even hurts performance to pass all of that stuff into the function. AND, by not using CONTACT, Collation Precedence will take over, and in most cases resolve the conflict automatically.

    SELECT 'now this is ' + tmp.Col1
    FROM #TT tmp;
    -- now this is something
    

    In this case, Collation Precedence will determine that the Collation of the tmp.Col1 column overrides the Collation of the string literal (which uses the "current" database's default Collation).

  2. Use the COLLATE clause (as you are already doing). There is nothing wrong with this approach as this is one of the uses of the COLLATE keyword.

    -- Force the Collation of the column in the temp table to match the "current" database:
    SELECT CONCAT('now this is ', tmp.Col1 COLLATE SQL_Latin1_General_CP1_CI_AS)
    FROM #TT tmp;
    -- now this is something
    
    
    -- Force the Collation of the string literal to match the column in the temp table:
    SELECT CONCAT('now this is ' COLLATE SQL_EBCDIC278_CP1_CS_AS, tmp.Col1)
    FROM #TT tmp;
    -- now this is something
    

    In these two cases, the Collation to use is determined by the first input parameter, and either that one needs to be explicitly set to the Collation of the second parameter (bottom example) which in this case is taken from the columns definition. Or, the second parameter needs to be explicitly set to match the Collation of the first parameter (top example) which in this case is taken from the databases's default as it is a string literal.