Sql-server – SQL Server collation mismatch reducing the amount of data visible to an application

collationsql server

One of our servers died recently, it was rebuilt with a wrong collation and so it looks as follows:

  • SQL Server – SQL_Latin1_General_CP1_CI_AS
  • master db – SQL_Latin1_General_CP1_CI_AS
  • model db – SQL_Latin1_General_CP1_CI_AI
  • msdb db – SQL_Latin1_General_CP1_CI_AI
  • temp db – SQL_Latin1_General_CP1_CI_AS
  • every user db – SQL_Latin1_General_CP1_CI_AI

The databases are accessible, my question is: will the data visible to users via an app be limited because of the mismatching collation? Some users have said that they think they have lost a lot of data even though the database was restored from a very recent backup.

It is SQL Server 2008 R2 with SP3 on Windows Server 2012 R2 (compatible only due to service packs)

Thank you

Best Answer

will the data visible to users via an app be limited because of the mismatching collation?

Assuming you are speaking of scenarios that do not error (because an error does not cause people to wonder if something is missing), but instead just return data that might not be ALL of the expected data, then it depends on how the queries are being done. If ALL of the following conditions are true, then yes, some data can be filtered out now when it wasn't before:

  1. a query uses a temporary table (local or global)
  2. the Collation for string columns in the CREATE TABLE #TempTable statement did not explicitly specify the Collation via the COLLATE keyword
  3. the query is filtering at least one such string column in the temporary table on a string literal or local variable
  4. there is an accent mismatch between the data in that column and the string used for filtering (in the literal or variable)

Meaning, previously someone could have executed:

SELECT * FROM #TempTable WHERE Name = 'a';

and gotten back rows containing the following: A a À Á Â Ã Ä Å à á â ã ä å. That same query, with the new tempdb Collation, will now return only rows containing: A a.

Also, outside of the unexpected filtering that is happening in your user-facing code, there are other potential problems waiting for you. If master is a different Collation, then names of system-level entities (e.g. Databases, Logins, etc) will behave outside of expectations and differently than the rest of the system and could cause odd behavior similar to what your users are reporting. Meaning, if you have accents in Database and/or Login and/or variable/parameter/cursor names and have code that works because it is expecting accent-insensitive comparisons, then you / your application might experience what your users are experiencing (though it could take a while to notice, and even longer to debug). AND, it is assumed that all four system DBs are the same Collation, and some stored procedures and views in msdb JOIN to master on string columns and do break in this scenario. Hence this does really need to be fixed.

SO, since it is just the system DBs that are "incorrect" (it is best to have master and msdb match as there are some stored procedures that JOIN between them and error when their Collations don't match), you should change only those back to the expected Collation of SQL_Latin1_General_CP1_CI_AI. You can do this easily enough by using SETUP.EXE /ACTION=Rebuilddatabase. It allows you to change just the system DBs and does not touch user Databases.

Also see the MSDN documentation for: Rebuild System Databases and Set or Change the Server Collation.

At the very least the command-line should look like (but all as one line, or you need to use ^ at the end of each line in order to continue the command on the following line):

Setup /QUIET /ACTION=REBUILDDATABASE /INSTANCENAME={InstanceName}
      /SQLSYSADMINACCOUNTS={accountsToAddToSysadminRole} 
      /SQLCOLLATION=SQL_Latin1_General_CP1_CI_AI [/otherOptions]

For this situation you do not need to use the other recommendation of the undocumented -q switch for sqlservr.exe. That option rebuilds ALL databases, and there is no need to make modifications to databases that are already correct. The less you touch (especially with an undocumented feature) the better off you are.