The database Collation for the first SELECT statement in my query is Latin1_General_CI_AS.
Well, not exactly. There are a few problems with this statement:
The Database's default Collation only matters in a query when using string literals, variables, and return values from UDFs. AND, that default Collation only matters if there is no column or COLLATE
keyword being used.
SELECT statements / queries, as a whole, do not use Collations. Collation is assigned per each string field, and it can be different for each field in a query.
The column in the first / top query, stopword
, is not using the Latin1_General_CI_AS
Collation (more on this in a moment).
I see that the system-stopwords are stored in the resource database, which can explain the difference in collation.
A column (or expression) coming from a different Database does not necessarily explain a difference in Collation. As stated above, Collation is set per each field of a query, whether that field comes from a column in a table or is an expression. Collation is usually derived naturally from Collation Precedence: Column Collation overrides literals and variables, and the COLLATE
keyword overrides both. When there is a conflict, then you need the COLLATE
keyword.
The main point here, however, is that if the stopword
column in sys.fulltext_system_stopwords
comes from a column in a Table in the Resource Database (i.e. mssqlsystemresource
), OR if it comes from an expression in a View in the Resource Database that has its Collation set via the COLLATE
keyword, then the default Collation of the Resource Database is irrelevant.
I can 'solve' my error by using COLLATE
in my query
Yes, the COLLATE
keyword is the way to go. However, to fix this conflict, you only need to specify the COLLATE
keyword in one query, though it does not matter which query.
For example, if I run the query with no COLLATE
keyword, I get:
SELECT ftsw.stopword
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword
FROM sys.fulltext_system_stopwords ftssw;
Msg 468, Level 16, State 9, Line 4
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_BIN" in the UNION operation.
So, assuming that the conflict is in the second (i.e. bottom) SELECT statement, I could fix it by applying the current DB's default Collation:
SELECT ftsw.stopword
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword COLLATE SQL_Latin1_General_CP1_CI_AS
FROM sys.fulltext_system_stopwords ftssw;
And that works. BUT, what if we try putting COLLATE
on the first / top SELECT:
SELECT ftsw.stopword COLLATE SQL_Latin1_General_CP1_CI_AS
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword
FROM sys.fulltext_system_stopwords ftssw;
That also works. And in fact, both of the following also work:
SELECT ftsw.stopword COLLATE Hebrew_100_CI_AS
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword
FROM sys.fulltext_system_stopwords ftssw;
-- and:
SELECT ftsw.stopword
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword COLLATE Hebrew_100_CI_AS
FROM sys.fulltext_system_stopwords ftssw;
Also, rather than use the DATABASE_DEFAULT
option, which here equates to Latin1_General_CI_AS, I would use Latin1_General_BIN (or better yet: Latin1_General_100_BIN2, which is newer and better) in this particular case because it will ensure that different strings that can be normalized into the same string via the "distinct" behavior of the UNION
(without the ALL
) and the case-insensitivity of the Latin1_General_CI_AS Collation show up as different rows.
Where does the Latin1_General_BIN collation come from? It looks like the sys.fulltext_system_stopwords table has a different collation, but why?
That Collation comes from an unlikely source. Let's look at the system catalog views in the query:
EXEC sys.sp_help N'sys.fulltext_stopwords';
The results indicate that this is a View (as expected) and that the stopword
column has a Collation of Latin1_General_BIN (not expected). But wait, if the Latin1_General_BIN Collation is coming from sys.fulltext_stopwords
, then what about sys.fulltext_system_stopwords
and where is the other Collation coming from? Let's look:
EXEC sys.sp_help N'sys.fulltext_system_stopwords';
The results indicate that this is a View (as expected) and that the stopword
column has a Collation of SQL_Latin1_General_CP1_CI_AS (not expected).
We can now dig a little deeper into the definition of each of those system catalog views:
EXEC sys.sp_helptext N'sys.fulltext_stopwords';
Returns (simplified):
SELECT
fts.stopword,
FROM sys.sysftstops fts
And then:
EXEC sys.sp_help N'sys.sysftstops';
The results indicate that this is a system table, and that the stopword
column indeed has a Collation of Latin1_General_BIN.
Next we can move on to the other system catalog view:
EXEC sys.sp_helptext N'sys.fulltext_system_stopwords';
Returns (simplified):
SELECT convert(nvarchar(64), stopword) as stopword, language_id
FROM OpenRowset(TABLE FTSYSSTPWD)
And FTSYSSTPWD
comes from the Resource Database, so there isn't much more we can do at this point.
Still, there is one last thing we can do to be clear about the Collation of the data coming from the Resource Database -- sys.fulltext_system_stopwords.stopword
:
CREATE DATABASE [FullTextCollationTest] COLLATE SQL_EBCDIC277_CP1_CS_AS;
GO
USE [FullTextCollationTest];
EXEC sys.sp_help N'sys.fulltext_stopwords';
-- Collation for [stopword] column: Latin1_General_BIN (same as before)
EXEC sys.sp_help N'sys.fulltext_system_stopwords';
-- Collation for [stopword] column: SQL_EBCDIC277_CP1_CS_AS (same as DB's default Collation)
GO
USE [master];
DROP DATABASE [FullTextCollationTest];
Best Answer
UPDATE
I finally had time to test this and was able to reproduce the problem. While not respecting collation precedence (noted in my original answer below) was a problem, it was not this problem (though both could be caused by the same underlying bug). Here is what I can now confirm:
NVARCHAR
"Name" columns in theComponent
andStatement
tables to be different than the other.UNION ALL
.UNION ALL
should have ended the query before it even got to theCONCAT
function, and it would have if both columns wereVARCHAR
, but if at least one column isNVARCHAR
, then theCONCAT
function prevents the query from terminating properly.CONCAT
is involved (at least in thisUNION ALL
scenario):UNION ALL
with the collation mismatch) is not the first parameter of theCONCAT
function, then it will terminate with a misleading error message, stating that the error is in the "concat operation" instead of being in the "UNION ALL operation" (oh, and let's not forget about the garbage collation name in the error message!)UNION ALL
with the collation mismatch) is the first parameter of theCONCAT
function, then it will actually succeed, using the database's default collation for the value returned by theCONCAT
function. (see final test case below)CONCAT
built-in function was fixed as of SQL Server 2014 as neither this behavior, nor the incorrect behavior shown in the "Original Answer" (see below) is reproducible starting with that version (and I tested with 2014, 2016, 2017, and 2019).Not sure if it was noted in any public forum, but it must have been noticed internally (to Microsoft) since it was fixed as of the next version (i.e. SQL Server 2014), though not in any Service Pack as I tested on SQL Server 2012, SP4 GDR (11.0.7462.6).
If you, or anyone else, is still using SQL Server 2012 and running into this, it is best to resolve the collation conflict at the source, which is in the
UNION ALL
operation. Just pick the table with the column that has the collation that you do not want, and apply theCOLLATE
clause there such that theUNION ALL
operation succeeds on its own, even if noCONCAT
function is being used. This is better than specifying theCOLLATE
in theCONCAT
function because that is after the fact of theUNION ALL
, which in this case should be failing, but is allowed to succeed due to the bug inCONCAT
.ORIGINAL ANSWER
The Collation problem (separate from the error message resulting from the Collation problem) is due to the
CONCAT
built-in function not honoring Collation Precedence, and hence requiring all input parameters to be of the same Collation. Clearly you have one input parameter that is not of the same Collation as the rest. That parameter iscte.Name
, which you have currently fixed via theCOLLATE
keyword.We can simulate this scenario as follows. You can run it from any database. The default Collation for the Database that I am executing the following code in is:
SQL_Latin1_General_CP1_CI_AS
.And the following two queries even show that the Collation is evaluated per each input parameter, with the first input parameter setting the Collation to be used:
This conflict can be fixed in two ways:
Don't use the
CONCAT
built-in function. The main benefit ofCONCAT
is that it doesn't require each parameter to be a string type; it will handle the conversion to string internally. This is convenient if you have several non-string items to concatenate. But if you only have strings, then it doesn't provide any benefit, and probably even hurts performance to pass all of that stuff into the function. AND, by not usingCONTACT
, Collation Precedence will take over, and in most cases resolve the conflict automatically.In this case, Collation Precedence will determine that the Collation of the
tmp.Col1
column overrides the Collation of the string literal (which uses the "current" database's default Collation).Use the
COLLATE
clause (as you are already doing). There is nothing wrong with this approach as this is one of the uses of theCOLLATE
keyword.In these two cases, the Collation to use is determined by the first input parameter, and either that one needs to be explicitly set to the Collation of the second parameter (bottom example) which in this case is taken from the columns definition. Or, the second parameter needs to be explicitly set to match the Collation of the first parameter (top example) which in this case is taken from the databases's default as it is a string literal.