Sql-server – Which collation should I choose for a muiti-language website

collationperformancesql server

Does a collation have any influence over a query speed? Does the size of a table change depending of the collation?

If I want to build a website that must support all possible languages (lets take for e.g. Google) which would be the recommended collation?

I will need to store characters such as 日本語, my searches over the website will have to return something for the sóméthíng input, it must be case insensitive as well.

How do I know which is the best choice to make? Which collation better suits this case?

Best Answer

Generally speaking, one of the Unicode variants is probably the best for broad language support - UTF-8 is going to use less memory per codepoint, and thus will have a slight advantage in any time/space tradeoffs you find yourself in need of making; however, I think there are some of the more esoteric languages/scripts that UTF-8 cannot represent (but I'm not 100% certain of that, I haven't done an exhaustive study on the matter).

This Wikipedia article might be enlightening on the dis/advantages of each.