SQL Server Best Practices – Why Mixing Column Collations is Bad

best practicescollationsql server

There are two reasons that prompts me to ask this question:

tSQLt
The T-SQL testing framework tSQLt considers it an issue of "High Severity" when there exists columns with a non-default collation. The author of the test states the following:

I am NOT suggesting that every string column should have a collation that matches the default collation for the database. Instead, I am suggesting that when it is different, there should be a good reason for it.

Yet, the severity of the failed test is, as mentioned, considered high.

Octopus Deploy
While configuring the Octopus Deploy Server, the setup fails with a FATAL error during the initialization of the OctopusServer-instance. The article related to the error-message does not explain why this is a requirement, but simply states that it will be a requirement for future deployments, from and including Octopus version 3.8.

As a side-note, RedGate's CI-tool package, the DLM Automation Suite, supports deployments with varying collations without complaints.

The recommendation of keeping all column collations to the database default seems more like guidelines or best practices to me. Why is it considered such a serious error by some?

Best Answer

The recommendation of keeping all column collations to the database default seems more like guidelines or best practices to me.

You are entirely correct here.

Why is it considered such a serious error by some?

For the same reason that you will often hear / read that "you should never use:"

CURSORs
GOTO statements
SQLCLR
WITH (NOLOCK)
etc, etc, etc

Some features / options / technologies are more complicated than others and generally require more knowledge by the user because the chances of getting into trouble when using it are much greater than the chances of not having any problems. So, it is easier to have generalized rules against such things for the general population. In fact, when writing up "Coding Standards" at work, I will always have a rule to never use CURSORs, yet I use them myself because I know both "when" to use them and "how" to use them effectively. But folks who only occasionally write queries shouldn't be expected to know that. This is also similar to "don't edit the Registry unless you absolutely know what you are doing", or rules that we make as parents for our (very young) children where we need to tell them not to do something simply because they are not capable of traversing the complexities of when it is ok to do a particular thing or how to go about doing it.

In the case of Collations, this is a very complex and confusing topic, and you can run into both hard-errors (these are an issue but less of an issue since they are obvious and hence easy enough to fix) and into "odd" behavior where it is hard to explain why things are acting the way that they are (why some items are filtered, or not filtered, outside of expectations, OR why sorting is acting outside of expectations). And sadly, there seems to be a rather large amount of misinformation floating around which furthers the mass confusion. I am actually working on a project to greatly increase the general knowledge of Collations and encodings, etc and hopefully counteract the misinformation and myths, but not yet ready to release it (when done I will update this with a link to it).

For Collation, you need to use what makes the most sense for the business case. The notion of not mixing Collations in a table or database is a default approach, but if you look at the Collations used for the various columns of the system catalog views, you will notice a variety of Collations being used. So I agree with the main quote in the question that IF the Collations are going to be different, it should be intentional, but there is nothing inherently wrong with it.

Regarding this from the question (emphasis added):

While configuring the Octopus Deploy Server, the setup fails with a FATAL error during the initialization of the OctopusServer-instance. The article related to the error-message does not explain why this is a requirement

I checked the linked documentation page and it does indeed explain why it is a requirement. I have copied the pertinent info from that documentation below:

You must ensure you also change the collation of all objects in the Octopus Database, otherwise errors can occur when modifying the database during Octopus version upgrades. New objects created will use the updated collation, and when attempting to (for example) perform SQL joins between these and existing objects using the original collation, collation mis-match errors may occur.

They are saying that their code, in the Octopus database, has JOINs between string columns and could likely have new code introduced in a future upgrade that has additional JOINs on new string columns. New columns, either via CREATE TABLE or ALTER TABLE ... ADD, will be assigned the default Collation of the database if the COLLATE keyword was not specified for the new string column(s). And JOINs between string columns that do not have the same Collation will generate a Collation mismatch error. They also seem to be allowing the user to choose their own Collation (possibly to accommodate different locales) since they say at the top that the only requirement is that the Collation be case-insensitive. And since the Collation of the database that their code lives in isn't guaranteed to always be the same, they can't use the COLLATE keyword to force the same Collation across all new string columns (well, they technically can, but that requires Dynamic SQL so not easy to deal with when generating update scripts). If they were able to use the COLLATE keyword, then they could get away with having the Database's default Collation be different than the string columns. That would avoid the hard "Collation mismatch" errors, but would still leave open the possibility of comparison operations involving one of those string columns and a string literal or variable resulting in "odd" behavior as it would use the column's Collation and not the Database's Collation. Of course, that could very well be expected behavior. But since this is a 3rd party app, the behavior should be what they intended rather than a 50 / 50 chance between a) what the user wanted (or didn't object to) and b) what the user considers a bug (and then wastes the vendor's support time on a wild goose chase and/or blogs about how their software is buggy).

Best Answer

Related Solutions

Sql-server – Multiple charsets and Collations for a Multinational Database

Mariadb – Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8_turkish_ci,IMPLICIT) for operation ‘=’ and why

Related Question