This may be too late for the original poster, but for completeness, the way to achieve case insensitive behaviour from PostgreSQL is to set a non-deterministic collation. This is only for Postgres 12.
A collation is either deterministic or nondeterministic. A
deterministic collation uses deterministic comparisons, which means
that it considers strings to be equal only if they consist of the same
byte sequence. Nondeterministic comparison may determine strings to be
equal even if they consist of different bytes. Typical situations
include case-insensitive comparison, accent-insensitive comparison, as
well as comparison of strings in different Unicode normal forms. It is
up to the collation provider to actually implement such insensitive
comparisons; the deterministic flag only determines whether ties are
to be broken using bytewise comparison. See also Unicode Technical
Standard 10 for more information on the terminology.
To create a nondeterministic collation, specify the property
deterministic = false to CREATE COLLATION, for example:
CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
This example would use the standard Unicode collation in a
nondeterministic way. In particular, this would allow strings in
different normal forms to be compared correctly. More interesting
examples make use of the ICU customization facilities explained above.
For example:
CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
All standard and predefined collations are deterministic, all
user-defined collations are deterministic by default. While
nondeterministic collations give a more “correct” behavior, especially
when considering the full power of Unicode and its many special cases,
they also have some drawbacks. Foremost, their use leads to a
performance penalty. Also, certain operations are not possible with
nondeterministic collations, such as pattern matching operations.
Therefore, they should be used only in cases where they are
specifically wanted.
Best Answer
This is actually not a SQL Server specific behavior, it's just how these things work in general.
So, the data is the data. If you are speaking about an index specifically, the data needs to be stored as it is else it would require a look-up in the main table each time to get the actual value, and there would be no possibility of a covering index (at least not for string types).
The data, either in the table/clustered index or non-clustered index, does not contain any collation / sorting info. It is simply data. The collation (locale/culture rules and sensitivities) is just meta data attached to the column and used when a sort operation is called (unless overridden by a
COLLATE
clause), which would include the creation/rebuild of an index. The rules defined by a non-binary collation are used to generate sort-keys, which are binary representations of the string (sort keys are unnecessary in binary collations). These binary representations incorporate all of the locale/culture rules and selected sensitivities. The sort-keys are used to place the records in their proper order, but are not themselves stored in the index or table. They aren't stored (at least I haven't seen these values in the index and was told that they aren't stored) because:There are two types of collations: SQL Server and Windows.
SQL Server
SQL Server collations (those with names starting with
SQL_
) are the older, pre-SQL Server 2000 way of sorting/comparing (even thoughSQL_Latin1_General_CP1_CI_AS
is still the installation default on US English OSes, quite sadly). In this older, simplistic, non-Unicode model, each combination of locale, code page, and the various sensitivities are given a static mapping of each of the characters in that code page. Each character is assigned a value (i.e. sort weight) to denote how it equates with the others. Comparisons in this model appear to do a two-pass operation:The only sensitivities that can be adjusted in these collations are: "case" and "accent" ("width", "kana type" and "variation selector" are not available). Also, none of these collations support Supplementary Characters (which makes sense as those are Unicode-specific and these collations only apply to non-Unicode data).
This approach applies only to non-Unicode
VARCHAR
data. Each unique combination of locale, code page, case-sensitivity, and accent-sensitivity has a specific "sort ID", which you can see in the following example:The only difference between the first two collations is the case-sensitivity. The third collation is a Windows collation and so does not have a static mapping table.
Also, these collations should sort and compare faster than the Windows collations due to being simple lookups for character to sort weight. However, these collations are also far less functional and should generally be avoided if at all possible.
Windows
Windows collations (those with names not starting with
SQL_
) are the newer (starting in SQL Server 2000) way of sorting/comparing. In this newer, complex, Unicode model, each combination of locale, code page, and the various sensitivities are not given a static mapping. For one thing, there are no code pages in this model. This model assigns a default sort value to each character, and then each locale/culture can re-assign sort values to any number of characters. This allows multiple cultures to use the same characters in different ways. This does have the affect of allowing for multiple languages to be sorted naturally using the same collation if they do not use the same characters (and if one of them does not need to re-assign any values and can simply use the defaults).The sort values in this model are not single values. They are an array of values that assign relative weights to the base letter, any diacritics (i.e. accents), casing, etc. If the collation is case-sensitive, then the "case" portion of that array is used, otherwise it's ignored (hence, insensitive). If the collation is accent-sensitive, then the "diacritic" portion of the array is used, otherwise it's ignored (hence, insensitive).
Comparisons in this model are a multi-pass operation:
For more details on this sorting, I will eventually publish a post that shows the sort key values, how they are calculated, the differences between SQL Server and Windows collations, etc. But for now, please see my answer to: Accent Sensitive Sort (please note that the other answer to that question is a good explanation of the official Unicode algorithm, but SQL Server instead uses a custom, though similar, algorithm, and even a custom weight table).
All sensitivities can be adjusted in these collations: "case", "accent", "width", "kana type", and "variation selector" (starting in SQL Server 2017, and only for the Japanese collations). Also, some of these collations (when used with Unicode data) support Supplementary Characters (starting in SQL Server 2012). This approach applies to both
NVARCHAR
andVARCHAR
data (even non-Unicode data). It applies to non-UnicodeVARCHAR
data by first converting the value to Unicode internally, and then applying the sort/comparison rules.Please note:
SQL_Latin1_General_CP1_CI_AS
for US English systems, so please vote for this suggestion). This can be changed during installation. This instance-level collation then sets the collation for the[model]
DB which is the template used when creating new DBs, but the collation can be changed when executingCREATE DATABASE
by specifying theCOLLATE
clause. This database-level collation is used for variable and string literals, as well as the default for new (and altered!) columns when theCOLLATE
clause is not specified (which is the case for the example code in the question).