I had to do this recently and it was a big pain. I ended up doing a write up on changing collations: Collation: Correction -> Expression, Column, Database
My personal preference for method in your situation would probably be to
- Create a new database (with the correct collation if needed)
- Script out the old database
- Do a search and replace on the script file to change the collation from one to another
- Run the script on the new database
- Move the data across
- Do a rename swap to put the new database in place of the old
Your other options are to either create a script that generates alters for each column or to use the GUI. The GUI is going to be slower but more certain. The script method generates the risk of making changes you didn't intend if your script isn't letter perfect.
The script method also has the problem that you have to drop any constraints, indexes etc that touch the column before you can alter it to change the collation. You might be able to generate a script to do that for you but it wouldn't be easy.
If you decide to go the script method I can probably help you generate one to create the alters but for dropping and re-creating indexes/constraints you would be on your own. Like I said I would go with the re-create the database method. It's easy and relatively fast unless you have a huge amount of data. The only real drawback is that it requires there to be double or even triple (space for logs) the size of your database free.
will the data visible to users via an app be limited because of the mismatching collation?
Assuming you are speaking of scenarios that do not error (because an error does not cause people to wonder if something is missing), but instead just return data that might not be ALL of the expected data, then it depends on how the queries are being done. If ALL of the following conditions are true, then yes, some data can be filtered out now when it wasn't before:
- a query uses a temporary table (local or global)
- the Collation for string columns in the
CREATE TABLE #TempTable
statement did not explicitly specify the Collation via the COLLATE
keyword
- the query is filtering at least one such string column in the temporary table on a string literal or local variable
- there is an accent mismatch between the data in that column and the string used for filtering (in the literal or variable)
Meaning, previously someone could have executed:
SELECT * FROM #TempTable WHERE Name = 'a';
and gotten back rows containing the following: A a À Á Â Ã Ä Å à á â ã ä å
. That same query, with the new tempdb
Collation, will now return only rows containing: A a
.
Also, outside of the unexpected filtering that is happening in your user-facing code, there are other potential problems waiting for you. If master
is a different Collation, then names of system-level entities (e.g. Databases, Logins, etc) will behave outside of expectations and differently than the rest of the system and could cause odd behavior similar to what your users are reporting. Meaning, if you have accents in Database and/or Login and/or variable/parameter/cursor names and have code that works because it is expecting accent-insensitive comparisons, then you / your application might experience what your users are experiencing (though it could take a while to notice, and even longer to debug). AND, it is assumed that all four system DBs are the same Collation, and some stored procedures and views in msdb
JOIN to master
on string columns and do break in this scenario. Hence this does really need to be fixed.
SO, since it is just the system DBs that are "incorrect" (it is best to have master
and msdb
match as there are some stored procedures that JOIN between them and error when their Collations don't match), you should change only those back to the expected Collation of SQL_Latin1_General_CP1_CI_AI
. You can do this easily enough by using SETUP.EXE /ACTION=Rebuilddatabase. It allows you to change just the system DBs and does not touch user Databases.
Also see the MSDN documentation for: Rebuild System Databases and Set or Change the Server Collation.
At the very least the command-line should look like (but all as one line, or you need to use ^
at the end of each line in order to continue the command on the following line):
Setup /QUIET /ACTION=REBUILDDATABASE /INSTANCENAME={InstanceName}
/SQLSYSADMINACCOUNTS={accountsToAddToSysadminRole}
/SQLCOLLATION=SQL_Latin1_General_CP1_CI_AI [/otherOptions]
For this situation you do not need to use the other recommendation of the undocumented -q
switch for sqlservr.exe
. That option rebuilds ALL databases, and there is no need to make modifications to databases that are already correct. The less you touch (especially with an undocumented feature) the better off you are.
Best Answer
I posted a detailed analysis of the implications of making any collation changes to an instance and/or database:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
There are quite a few areas that could be affected, and it depends greatly on if you are talking just about one particular database, or the instance and one or more of the databases on that instance.
HOWEVER, perhaps in addition to asking what might the affect of such a change be, you should probably also be asking the vendor:
COLLATE
clause in theCREATE TABLE
statements to override the default from the database; they can use theCOLLATE
clause in any expression to override the database's default collation.Latin1_General_100_CS_AS_SC
would be the preferred equivalent ofSQL_Latin1_General_CP1_CS_AS
.