Sql-server – Multiple charsets and Collations for a Multinational Database

collationsql-server-2008-r2

After decades of business speaking with companies that adhere to the general Latin1 Collation, my company is facing the issue of storing information in a different charset and collation: Greek.
So, it's time to start thinking a sort of redesign of our dbs.

Given that my installation is a MS SQLServer 2008 R2, what are the best methods, or general accepted guide lines, to do such a thing?
Multiple tables? Multiple Dbs with different settings?

I'm not a DBA, I'm only asking to have a starting point to mumble upon.

Thank you very much for reading and to all who will care to reply.

Best Answer

Quote from this MS tech-page:

If the users of your instance of SQL Server speak multiple languages, you should pick a collation that best supports the requirements of the various languages. For example, if the users generally speak western European languages, choose the Latin1_General collation. When you support users who speak multiple languages, it is most important to use the Unicode data types, nchar, nvarchar, and ntext, for all character data. Unicode was designed to eliminate the code page conversion difficulties of the non-Unicode char, varchar, and text data types. Collation still makes a difference when you implement all columns using Unicode data types because it defines the sort order for comparisons and sorts of Unicode characters. Even when you store your character data using Unicode data types you should pick a collation that supports most of the users in case a column or variable is implemented using the non-Unicode data types.

So, just like @Gonsalu said in a comment to @TechiGurl: build your database for Unicode to support multiple languages.

In practice this means using nchar/nvarchar/ntext datatypes, and not char/varchar/text.

Choosing a collation is a matter of query-optimization. Any collation will order your text in a definitive manner. Collations specify the rules for how strings of character data are sorted and compared, based on the norms of particular languages and locales. Thus you should choose a collation that best serves the language-requirements of most of the queries being run on your database.

In the case given by the OP, if Greek-Text now makes up 10% of all data, then I would stay with the Latin1 Collation. If there is more Greek-Language data or most of the queries being run on the database retrieve Greek-Data, then I would go with Greek_ Collation.

Here is a list of default MS Collations.