Performance Impact of Changing Database Default Collation to Latin1_General_BIN

collationperformanceperformance-tuningsql serversql-server-2008-r2unicode

I have set the database collation to Latin1_General_BIN, to make string comparisons case-sensitive. Will this have an impact on performance? Will it have any impact on DML or DDL operations in the database? The database already exists with tables in it.

Best Answer

Collations in SQL Server determine the rules for matching and sorting character data. Normally, you would choose a collation first based on the comparison semantics and sorting order the consumers of the data require.

Humans generally do not find that binary collations produce the sorting and comparison behaviours they expect. So, although these offer the best performance (especially the pure code-point BIN2 versions) most implementations do not use them.

Next in raw performance terms (but only for non-Unicode strings) are the backward-compatibility SQL collations. When working with Unicode data, these collations use a Windows collation instead, with the same performance characteristics. There are subtle traps here, so you need to have good reasons to choose a SQL collation these days (unless working on a US system, where it is still the default).

Windows collations are the slowest, in general, because of the complex Unicode comparison and sorting rules. Nevertheless, these offer complete compatibility with Windows in SQL Server, and are regularly maintained to keep up with changes in the Unicode standard. For modern use that includes Unicode data, a Windows collation is generally recommended.

TL;DR

If all you want is case-sensitive comparison and sorting semantics, you should choose the _CS_ (for Case Sensitive) variation of whichever base collation provides the expected behaviour for your users' language and culture. For example, both these are case-sensitive collations:

-- Latin1-General, case-sensitive, accent-sensitive
Latin1_General_CS_AS 

-- Latin1-General, case-sensitive, accent-sensitive for Unicode Data, 
-- SQL Server Sort Order 51 on Code Page 1252 for non-Unicode Data
SQL_Latin1_General_CP1_CS_AS

You can see these definitions using sys.fn_helpcollations

Examples

Four tables that are exactly the same except for the collation; one binary, one case-sensitive, one case-insensitive, and one SQL case-sensitive:

CREATE TABLE #Example_BIN
(
    string nvarchar(50) 
        COLLATE Latin1_General_BIN
        NOT NULL
);

CREATE TABLE #Example_CS
(
    string nvarchar(50) 
        COLLATE Latin1_General_CS_AI
        NOT NULL
);

CREATE TABLE #Example_CI
(
    string nvarchar(50) 
        COLLATE Latin1_General_CI_AI
        NOT NULL
);

CREATE TABLE #Example_SQL
(
    string varchar(50) -- Note varchar
        COLLATE SQL_Latin1_General_CP1_CS_AS
        NOT NULL
);

Same sample data for each table:

INSERT #Example_BIN
    (string)
VALUES
    (N'A'),
    (N'a'),
    (N'B'),
    (N'b'),
    (N'C'),
    (N'c');

INSERT #Example_CS
SELECT EB.string 
FROM #Example_BIN AS EB;

INSERT #Example_CI
SELECT EB.string 
FROM #Example_BIN AS EB;

INSERT #Example_SQL
SELECT EB.string 
FROM #Example_BIN AS EB;

Now we want to find strings greater than 'a':

SELECT EB.string AS BIN
FROM #Example_BIN AS EB
WHERE EB.string > N'a'
ORDER BY EB.string;

SELECT EC.string AS CS
FROM #Example_CS AS EC
WHERE EC.string > N'a'
ORDER BY EC.string;

SELECT EC2.string AS CI
FROM #Example_CI AS EC2
WHERE EC2.string > N'a'
ORDER BY EC2.string;

SELECT ES.string AS SQL
FROM #Example_SQL AS ES
WHERE ES.string > 'a' -- not Unicode
ORDER BY ES.string;

Results:

╔═════╗
║ BIN ║
╠═════╣
║ b   ║
║ c   ║
╚═════╝

╔════╗
║ CS ║
╠════╣
║ A  ║
║ b  ║
║ B  ║
║ c  ║
║ C  ║
╚════╝

╔════╗
║ CI ║
╠════╣
║ B  ║
║ b  ║
║ C  ║
║ c  ║
╚════╝

╔═════╗
║ SQL ║
╠═════╣
║ B   ║
║ b   ║
║ C   ║
║ c   ║
╚═════╝

Finally...

Note though, if we use a Unicode literal with the SQL collation, the implicit conversion rules result in a Windows collation comparison:

SELECT ES.string AS SQL
FROM #Example_SQL AS ES
WHERE ES.string > N'a'
ORDER BY ES.string;

...and the SQL collation results change:

╔═════╗
║ SQL ║
╠═════╣
║ A   ║
║ B   ║
║ b   ║
║ C   ║
║ c   ║
╚═════╝