Running this results in "test", as expected:
SELECT
REPLACE(NCHAR(1234), NCHAR(1234), N'test');
However, running this results in "aӒa", which doesn't have "test" in it:
SELECT
REPLACE(N'a' + NCHAR(1234) + N'a', NCHAR(1234), N'test');
I thought it might have to do with the string concatenation happening for the haystack but not the needle, but when I tried this, it still didn't "work":
SELECT
REPLACE(N'a' + NCHAR(1234) + N'a', N'' + NCHAR(1234) + N'', N'test');
Result: "aӒa"
I suspected it might have to do with how it's interpreting the characters, so I tried specifying a binary collation… and that "fixed" the issue:
SELECT
REPLACE(N'a' + NCHAR(1234) + N'a' COLLATE Latin1_General_100_BIN2, NCHAR(1234), N'test');
Result: "atesta".
Why?
This behavior seems to exist for some characters, but not others.
SELECT
REPLACE(N'a' + NCHAR(23423) + N'a', NCHAR(23423), N'test');
Result: "atesta" ("works")
SELECT
REPLACE(N'a' + NCHAR(5342) + N'a', NCHAR(5342), N'test');
Result: "aᓞa" (doesn't "work")
Why?
Best Answer
This behavior is simply due to the fact that you are executing the query in a Database that has a pre-"100" series default Collation, in which case a great many characters do not have any sort weight. No sort weight means that they equate to empty string. They have a value of 0. So they always equal each other when nothing else having a sort weight is involved. They also equate to empty string. The version 100 Collations (starting in SQL Server 2008 which is version 10.0 or 100 without the decimal / minor version) added sort weights for most characters. So, just force the collation by adding
COLLATE Latin1_General_100_CI_AS_SC
. A binary Collation of any version (one ending in_BIN2
, don't use Collations ending in just_BIN
) also works as there are no such things as sort weights in binary Collations.Even multiple characters with no weight still equate to empty string (or any number of characters with no sort weight):
In this next test,
NCHAR(1234)
equates to itself only when using a version 100 (or newer) Collation. Pre-version 100 Collations assign no sort weight toNCHAR(1234)
such that it doesn't equate to anything, hence it was previously not found in the first expression.Continuing example # 5 above, we can prove that 3 characters having no weight equate to 2 different characters that also have no weight:
But switching to a version 100 Collation changes that:
Please see the "Supplementary Characters" characters section of the following post of mine. In that section I show similar behavior as it relates to supplementary characters, and at the end of the section I show how many BMP characters (i.e. non-supplementary characters) match one of these "missing weights" characters per Collation version.
The Uni-Code: The Search for the True List of Valid Characters for T-SQL Identifiers, Part 3 of 2 (Delimited Identifiers)
For more info on working with Collations, please visit: Collations.info