Performance Impact of Changing Database Default Collation to Latin1_General_BIN

collationperformanceperformance-tuningsql serversql-server-2008-r2unicode

I have set the database collation to Latin1_General_BIN, to make string comparisons case-sensitive. Will this have an impact on performance? Will it have any impact on DML or DDL operations in the database? The database already exists with tables in it.

Best Answer

Collations in SQL Server determine the rules for matching and sorting character data. Normally, you would choose a collation first based on the comparison semantics and sorting order the consumers of the data require.

Humans generally do not find that binary collations produce the sorting and comparison behaviours they expect. So, although these offer the best performance (especially the pure code-point BIN2 versions) most implementations do not use them.

Next in raw performance terms (but only for non-Unicode strings) are the backward-compatibility SQL collations. When working with Unicode data, these collations use a Windows collation instead, with the same performance characteristics. There are subtle traps here, so you need to have good reasons to choose a SQL collation these days (unless working on a US system, where it is still the default).

Windows collations are the slowest, in general, because of the complex Unicode comparison and sorting rules. Nevertheless, these offer complete compatibility with Windows in SQL Server, and are regularly maintained to keep up with changes in the Unicode standard. For modern use that includes Unicode data, a Windows collation is generally recommended.

TL;DR

If all you want is case-sensitive comparison and sorting semantics, you should choose the _CS_ (for Case Sensitive) variation of whichever base collation provides the expected behaviour for your users' language and culture. For example, both these are case-sensitive collations:

-- Latin1-General, case-sensitive, accent-sensitive
Latin1_General_CS_AS 

-- Latin1-General, case-sensitive, accent-sensitive for Unicode Data, 
-- SQL Server Sort Order 51 on Code Page 1252 for non-Unicode Data
SQL_Latin1_General_CP1_CS_AS

You can see these definitions using sys.fn_helpcollations

Examples

Four tables that are exactly the same except for the collation; one binary, one case-sensitive, one case-insensitive, and one SQL case-sensitive:

CREATE TABLE #Example_BIN
(
    string nvarchar(50) 
        COLLATE Latin1_General_BIN
        NOT NULL
);

CREATE TABLE #Example_CS
(
    string nvarchar(50) 
        COLLATE Latin1_General_CS_AI
        NOT NULL
);

CREATE TABLE #Example_CI
(
    string nvarchar(50) 
        COLLATE Latin1_General_CI_AI
        NOT NULL
);

CREATE TABLE #Example_SQL
(
    string varchar(50) -- Note varchar
        COLLATE SQL_Latin1_General_CP1_CS_AS
        NOT NULL
);

Same sample data for each table:

INSERT #Example_BIN
    (string)
VALUES
    (N'A'),
    (N'a'),
    (N'B'),
    (N'b'),
    (N'C'),
    (N'c');

INSERT #Example_CS
SELECT EB.string 
FROM #Example_BIN AS EB;

INSERT #Example_CI
SELECT EB.string 
FROM #Example_BIN AS EB;

INSERT #Example_SQL
SELECT EB.string 
FROM #Example_BIN AS EB;

Now we want to find strings greater than 'a':

SELECT EB.string AS BIN
FROM #Example_BIN AS EB
WHERE EB.string > N'a'
ORDER BY EB.string;

SELECT EC.string AS CS
FROM #Example_CS AS EC
WHERE EC.string > N'a'
ORDER BY EC.string;

SELECT EC2.string AS CI
FROM #Example_CI AS EC2
WHERE EC2.string > N'a'
ORDER BY EC2.string;

SELECT ES.string AS SQL
FROM #Example_SQL AS ES
WHERE ES.string > 'a' -- not Unicode
ORDER BY ES.string;

Results:

╔═════╗
║ BIN ║
╠═════╣
║ b   ║
║ c   ║
╚═════╝

╔════╗
║ CS ║
╠════╣
║ A  ║
║ b  ║
║ B  ║
║ c  ║
║ C  ║
╚════╝

╔════╗
║ CI ║
╠════╣
║ B  ║
║ b  ║
║ C  ║
║ c  ║
╚════╝

╔═════╗
║ SQL ║
╠═════╣
║ B   ║
║ b   ║
║ C   ║
║ c   ║
╚═════╝

Finally...

Note though, if we use a Unicode literal with the SQL collation, the implicit conversion rules result in a Windows collation comparison:

SELECT ES.string AS SQL
FROM #Example_SQL AS ES
WHERE ES.string > N'a'
ORDER BY ES.string;

...and the SQL collation results change:

╔═════╗
║ SQL ║
╠═════╣
║ A   ║
║ B   ║
║ b   ║
║ C   ║
║ c   ║
╚═════╝

Related Solutions

SQL Server 2014 – Can Database Objects Be Case Insensitive While Keeping Strings Case Sensitive?

Mostly "yes", depending on how "automagic" you want everything to work.

First: Database identifier Collation (i.e. object names, column names, index names, etc) is taken from the Database's default Collation. That will determine the Collation of columns such as sys.objects.name, etc. So to do this, you can create or alter the Database to have a case-Insensitive Collation, such as Latin1_General_100_CI_AS_SC. This will allow you to have a table TEST that will be valid in queries such as select * from test;.

Server-scoped identifier Collation (e.g. Database names) is handled by the Instance's default Collation.

Second: If a Database's default Collation is case-Insensitive, then expressions that are entirely string literals and/or variables and/or output-parameters and/or UDF return values will be treated as case-Insensitive. For these expressions you would need to override that default behavior, per expression, using the COLLATE clause.

For example:

/* take database default collation behavior */
IF ('test' = 'TEST')
BEGIN
  PRINT 'Case-Insensitive';
END;
ELSE
BEGIN
  PRINT 'Case-Sensitive';
END;
-- ? result depends on the default collation of the "current" database ?


/* force case-sensitive */
IF ('test' = 'TEST' COLLATE Latin1_General_100_CS_AS_SC)
BEGIN
  PRINT 'Case-Insensitive';
END;
ELSE
BEGIN
  PRINT 'Case-Sensitive';
END;
-- Case-Sensitive


/* force case-insensitive */
IF ('test' = 'TEST' COLLATE Latin1_General_100_CI_AS_SC)
BEGIN
  PRINT 'Case-Insensitive';
END;
ELSE
BEGIN
  PRINT 'Case-Sensitive';
END;
-- Case-Insensitive

Expressions between string columns and either string literals or variables will use the Collation of the string column, and if the Collation for the column is case-sensitive, then the expression will be evaluated as case-sensitive without needing to use the COLLATE clause.

By default, newly created columns that do not specify the COLLATE clause will inherit the Database's default Collation. In this case, you will need to specify something like COLLATE Latin1_General_100_CS_AS_KS_WS_SC for all VARCHAR / NVARCHAR columns that you want to be sensitive across all 4 categories (plus support Supplementary Characters).

PLEASE NOTE: Be very cautious about using a binary Collation (i.e. Latin1_General_100_BIN2) to get an accent/case/kana/width sensitive Collation; binary Collations are not truly case-sensitive.

Sql-server – Setting and Changing the Server Collation after installation

Based on Sean's comment on the question, it sounds like there is a requirement that the Server / Instance -level Collations be the same. But, even if you could technically set up the Availability Group with differing Server default Collations, you wouldn't want to do that as you could easily experience unpredictable / inconsistent behavior when there is a fail over, such as sorting and/or comparison differences.

You could also have code breaking. For example, if the primary system is case insensitive and you have casing differences in variable names, that code will break if moving to an instance having either a case sensitive or binary Collation. This is due to Server-level Collation affecting variable names (including cursor variable names).