We're moving from SQL 2005 [Instance and DB have collation of SQL_Latin1_General_CP1_CI_AS
] to SQL 2008 [which defaults to Latin1_General_CI_AS
].
I completed a SQL 2008 R2 installation, and used default Latin1_General_CI_AS
collation, with the restoration of the database still on SQL_Latin1_General_CP1_CI_AS
. The excepted problems occurred – the #temp tables where in Latin1_General_CI_AS
whilst the db was in SQL_Latin1_General_CP1_CI_AS
and this is where I am now – I need advice on the pitfalls now please.
On installation of SQL 2008 R2, I have the option on installation to use 'SQL Collation, used for backwards compatibility'
where I have the option to select the same collation as the 2005 database : SQL_Latin1_General_CP1_CI_AS
.
-
This will allow me to not have problems with #temp tables, but are there pitfalls?
-
Would I lose any functionality or features of any kind by not using a "current" collation of SQL 2008?
- What about when we move (e.g. in 2 years ) from 2008 to SQL 2012? Will I have problems then?
-
Would I at some point be forced to go to
Latin1_General_CI_AS
? -
I read that some DBA's script complete the rows of complete databases, and then run the insert script into the database with the new collation – I'm very scared and wary of this – would you recommend doing this?
Best Answer
First of all, apologies for such a long answer, as I feel that still there is a lot of confusion when people talk about terms like collation, sort order, code page, etc.
From BOL :
This means that Collation is very important as it specifies rules on how character strings of the data are sorted and compared.
Note: More info on COLLATIONPROPERTY
Now Lets first understand the differences ......
Running below T-SQL :
The results would be :
Looking at above results, the only difference is the Sort Order between the 2 collations.But that is not true, which you can see why as below :
Test 1 :
Results of Test 1:
From above results we can see that the we cannot directly compare values on columns with different collations, you have to use
COLLATE
to compare the column values.TEST 2 :
The major difference is performance, as Erland Sommarskog points out at this discussion on msdn.
--- Create Indexes on both tables
--- Run the queries
--- This will have IMPLICIT Conversion
--- Run the queries
--- This will NOT have IMPLICIT Conversion
The reason for Implicit conversion is because, I have my database & Server collation both as
SQL_Latin1_General_CP1_CI_AS
and the table Table_Latin1_General_CI_AS has column Comments defined asVARCHAR(50)
with COLLATE Latin1_General_CI_AS, so during the lookup SQL Server has to do an IMPLICIT conversion.Test 3:
With the same set up, now we will compare the varchar columns with nvarchar values to see the changes in the execution plans.
-- run the query
-- run the query
Note that the first query is able to do Index seek but has to do Implicit conversion while the second one does an Index scan which prove to be inefficient in terms of performance when it will scan large tables.
Conclusion :
SQL_Latin1_General_CP1_CI_AS
is a SQL collation with the rules that allow you to sort data for unicode and non-unicode are different.Latin1_General_CI_AS
is a Windows collation with the rules that allow you to sort data for unicode and non-unicode are same.See my answer above.
It all depends on what functionality/features you are referring to. Collation is storing and sorting of data.
Cant vouch ! As things might change in and its always good to be inline with Microsoft's suggestion + you need to understand your data and the pitfalls that I mentioned above. Also refer to this and this connect items.
When you want to change collation, then such scripts are useful. I have found myself changing collation of databases to match server collation many times and I have some scripts that does it pretty neat. Let me know if you need it.
References :