Sql-server – Using collate: When to use and when not to use

collationsql-server-2008-r2t-sql

In one of my applications we had tsql like shown below

SELECT DISTINCT [Dblist].[DbName] AS [DbName]
FROM [Common].dbo.[Dblist]
WHERE dbname not in (
    SELECT [name]
    FROM master.dbo.sysdatabases )

For one of my customers it gave an error related to collation as the database Common (part of my application) has default collation sql_latin1_general_cp1_ci_as whereas master db has collation latin1_general_ci_as. I found a solution using collate and it worked. I want the solution to work on all collations.

SELECT DISTINCT [Dblist].[DbName] AS [DbName]
FROM [Common].dbo.[Dblist]
WHERE dbname not in (
    SELECT [name] COLLATE DATABASE_DEFAULT 
    FROM master.dbo.sysdatabases )

Now I am confused. Should I use COLLATE in all queries which involve system databases?
When to use collate and when not to use?

Also, is the above way of using collate proper (like COLLATE DATABASE_DEFAULT)? Here DATABASE_DEFAULT is latin1_general_ci_as itself as it is the collation of master database. Then how did the solution work as they again don't match with Common database? I want solution which will work on all collations.

Best Answer

If you are going to use custom collations for specific databases then yes, you'll need to make the collations match whenever you are joining or unioning data from the two databases.

In fact you will need to do this with many metadata queries anyway. Just look at catalog views like sys.tables:

SELECT c.name, c.collation_name
FROM sys.all_columns AS c
INNER JOIN sys.all_views AS v
ON c.[object_id] = v.[object_id]
INNER JOIN sys.schemas AS s
ON v.[schema_id] = s.[schema_id]
WHERE s.name = N'sys' AND v.name = N'tables'
AND c.collation_name IS NOT NULL;

Results:

name                    SQL_Latin1_General_CP1_CI_AS
type                    Latin1_General_CI_AS_KS_WS
type_desc               Latin1_General_CI_AS_KS_WS
lock_escalation_desc    Latin1_General_CI_AS_KS_WS
durability_desc         Latin1_General_CI_AS_KS_WS

So here we have columns with two different collations inside a single system object. No database can have a DATABASE_DEFAULT that matches both...

You don't have any control over the collation of your customer's columns or databases, or the server collation. So really the only ways to resolve the conflict are to:

  • use a method that doesn't hard-code a specific collation (like DATABASE_DEFAULT)
  • hard-code some specific, compatible collation on both sides

Since the latter is more work, makes for more complex queries, and introduces more opportunities to turn seeks into scans, I think the former is really your best option.