The database Collation for the first SELECT statement in my query is Latin1_General_CI_AS.
Well, not exactly. There are a few problems with this statement:
The Database's default Collation only matters in a query when using string literals, variables, and return values from UDFs. AND, that default Collation only matters if there is no column or COLLATE
keyword being used.
SELECT statements / queries, as a whole, do not use Collations. Collation is assigned per each string field, and it can be different for each field in a query.
The column in the first / top query, stopword
, is not using the Latin1_General_CI_AS
Collation (more on this in a moment).
I see that the system-stopwords are stored in the resource database, which can explain the difference in collation.
A column (or expression) coming from a different Database does not necessarily explain a difference in Collation. As stated above, Collation is set per each field of a query, whether that field comes from a column in a table or is an expression. Collation is usually derived naturally from Collation Precedence: Column Collation overrides literals and variables, and the COLLATE
keyword overrides both. When there is a conflict, then you need the COLLATE
keyword.
The main point here, however, is that if the stopword
column in sys.fulltext_system_stopwords
comes from a column in a Table in the Resource Database (i.e. mssqlsystemresource
), OR if it comes from an expression in a View in the Resource Database that has its Collation set via the COLLATE
keyword, then the default Collation of the Resource Database is irrelevant.
I can 'solve' my error by using COLLATE
in my query
Yes, the COLLATE
keyword is the way to go. However, to fix this conflict, you only need to specify the COLLATE
keyword in one query, though it does not matter which query.
For example, if I run the query with no COLLATE
keyword, I get:
SELECT ftsw.stopword
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword
FROM sys.fulltext_system_stopwords ftssw;
Msg 468, Level 16, State 9, Line 4
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_BIN" in the UNION operation.
So, assuming that the conflict is in the second (i.e. bottom) SELECT statement, I could fix it by applying the current DB's default Collation:
SELECT ftsw.stopword
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword COLLATE SQL_Latin1_General_CP1_CI_AS
FROM sys.fulltext_system_stopwords ftssw;
And that works. BUT, what if we try putting COLLATE
on the first / top SELECT:
SELECT ftsw.stopword COLLATE SQL_Latin1_General_CP1_CI_AS
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword
FROM sys.fulltext_system_stopwords ftssw;
That also works. And in fact, both of the following also work:
SELECT ftsw.stopword COLLATE Hebrew_100_CI_AS
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword
FROM sys.fulltext_system_stopwords ftssw;
-- and:
SELECT ftsw.stopword
FROM sys.fulltext_stopwords ftsw
UNION
SELECT ftssw.stopword COLLATE Hebrew_100_CI_AS
FROM sys.fulltext_system_stopwords ftssw;
Also, rather than use the DATABASE_DEFAULT
option, which here equates to Latin1_General_CI_AS, I would use Latin1_General_BIN (or better yet: Latin1_General_100_BIN2, which is newer and better) in this particular case because it will ensure that different strings that can be normalized into the same string via the "distinct" behavior of the UNION
(without the ALL
) and the case-insensitivity of the Latin1_General_CI_AS Collation show up as different rows.
Where does the Latin1_General_BIN collation come from? It looks like the sys.fulltext_system_stopwords table has a different collation, but why?
That Collation comes from an unlikely source. Let's look at the system catalog views in the query:
EXEC sys.sp_help N'sys.fulltext_stopwords';
The results indicate that this is a View (as expected) and that the stopword
column has a Collation of Latin1_General_BIN (not expected). But wait, if the Latin1_General_BIN Collation is coming from sys.fulltext_stopwords
, then what about sys.fulltext_system_stopwords
and where is the other Collation coming from? Let's look:
EXEC sys.sp_help N'sys.fulltext_system_stopwords';
The results indicate that this is a View (as expected) and that the stopword
column has a Collation of SQL_Latin1_General_CP1_CI_AS (not expected).
We can now dig a little deeper into the definition of each of those system catalog views:
EXEC sys.sp_helptext N'sys.fulltext_stopwords';
Returns (simplified):
SELECT
fts.stopword,
FROM sys.sysftstops fts
And then:
EXEC sys.sp_help N'sys.sysftstops';
The results indicate that this is a system table, and that the stopword
column indeed has a Collation of Latin1_General_BIN.
Next we can move on to the other system catalog view:
EXEC sys.sp_helptext N'sys.fulltext_system_stopwords';
Returns (simplified):
SELECT convert(nvarchar(64), stopword) as stopword, language_id
FROM OpenRowset(TABLE FTSYSSTPWD)
And FTSYSSTPWD
comes from the Resource Database, so there isn't much more we can do at this point.
Still, there is one last thing we can do to be clear about the Collation of the data coming from the Resource Database -- sys.fulltext_system_stopwords.stopword
:
CREATE DATABASE [FullTextCollationTest] COLLATE SQL_EBCDIC277_CP1_CS_AS;
GO
USE [FullTextCollationTest];
EXEC sys.sp_help N'sys.fulltext_stopwords';
-- Collation for [stopword] column: Latin1_General_BIN (same as before)
EXEC sys.sp_help N'sys.fulltext_system_stopwords';
-- Collation for [stopword] column: SQL_EBCDIC277_CP1_CS_AS (same as DB's default Collation)
GO
USE [master];
DROP DATABASE [FullTextCollationTest];
[0-9]
is not some type of regular expression defined to just match digits.
Any range in a LIKE
pattern matches characters between the start and end character according to collation sort order.
SELECT CodePoint,
Symbol,
RANK() OVER (ORDER BY Symbol COLLATE Latin1_General_CI_AS) AS Rnk
FROM #CodePage
WHERE Symbol LIKE '[0-9]' COLLATE Latin1_General_CI_AS
ORDER BY Symbol COLLATE Latin1_General_CI_AS
Returns
CodePoint Symbol Rnk
-------------------- ------ --------------------
48 0 1
188 ¼ 2
189 ½ 3
190 ¾ 4
185 ¹ 5
49 1 5
50 2 7
178 ² 7
179 ³ 9
51 3 9
52 4 11
53 5 12
54 6 13
55 7 14
56 8 15
57 9 16
So you get these results because under your default collation these characters sort after 0
but before 9
.
It looks as though the collation is defined to actually sort them in mathematical order with the fractions in the correct order between 0
and 1
.
You could also use a set rather than a range. To avoid 2
matching ²
you would need a CS
collation
SELECT CodePoint, Symbol
FROM #CodePage
WHERE Symbol LIKE '[0123456789]' COLLATE Latin1_General_CS_AS
Best Answer
The problem should be with the string literal (i.e. the
''SERVERA''
) as that takes on the Collation of the database where that statement is executing.So try the following for the top part of your query:
And, the bottom part of the
UNION ALL
(the part for[SERVERB]
) should probably select "SERVERB" instead of "SERVERA" as the string literal :).And to be technical, it is taking on the Collation of the DB, not the server, though if the default connection is to
master
(or any system DB), then that would naturally be the same as the Server-level Collation since the Server-level Collation is used to create the system DBs.UPDATE:
The new error you are getting after adding the
COLLATE
clause to the 2nd query in theUNION ALL
is due to adding theCOLLATE
clause to too many of the fields. It needs to go only to the literal string, and not on theresourcephone.[PhoneExt]
field. Doing that one extra step put the fields that were already in agreement into conflict. Since both of those table fields are inLatin1_General_BIN
, there is no need to override the Collation on theresourcephone.[PhoneExt]
field from either Server. It is only the string literal "field" that needs to be overriden, and only from one of the two servers: you just need to make one match the other.Adding the
COLLATE
clause to all fields, especially ones that are already working as expected (i.e. theresourcephone.[PhoneExt]
field) is unnecessary and is over-complicating the query.