SQL Server – Handling Problematic Slashes and Question Marks in Unique Index Values

sql serversql-server-2016unicode

What is it about the Phone values inserted below that SQL Server is treating them as identical in a unique index?

CREATE TABLE Phone
(
  Id int identity(1, 1) primary key,
  Phone nvarchar(448) not null
)
go

create unique index IX_Phone on Phone(Phone)
with (data_compression = page);
go

insert into Phone Values ('?281/?263-?8400');
insert into Phone Values ('‎281/‎263-‎8400');

select * from Phone;

drop table Phone;

I receive an error message:

Msg 2601, Level 14, State 1, Line 13
Cannot insert duplicate key row in object 'dbo.Phone' with unique index 'IX_Phone'. The duplicate key value is (?281/?263-?8400).

Best Answer

Your problem is that you passes unicode strings as non-unicode.

Your '‎281/‎263-‎8400' is a string of 15 characters, not 12, there are 3 non printable 8206 symbols, Left-to-right mark

select len('‎281/‎263-‎8400'); -- 15 !!!

Try this code where I pass the unicode values as unicode and see that there is no problem at all:

CREATE TABLE dbo.Phone
(
  Id int identity(1, 1) primary key,
  Phone nvarchar(448) not null
)
go

create unique index IX_Phone on Phone(Phone)
with (data_compression = page);
go

insert into Phone(phone) Values (N'?281/?263-?8400');
insert into Phone(phone) Values (N'‎281/‎263-‎8400');

select * from Phone;

And this is what your string '‎281/‎263-‎8400' really contains (dbo.nums is my table that contains natural numbers):

declare @t table(col1 nvarchar(100), col2 nvarchar(100));
insert into @t values (N'?281/?263-?8400',  N'‎281/‎263-‎8400'); 

select n, unicode(substring(col1, n, 1)), unicode(substring(col2, n, 1))
from @t cross join dbo.nums
where n <= 15;

Now what happens when you pass your unicode string as non unicode.

Your non-printable symbol 8206 is transformed into ?, that is how non-unicode strings works: every character that cannot be find in the corresponding codepage and represented as ascii code is substituted with question mark.

So for example if you use Latin Collation and try to compare Hebrew and Cyrillic characters (the same number of characters) as varchar they always be equal just because they are transformed to question marks while comparing as nvarchar they are different:

So if now you'll try to insert these 2 values (Hebrew + Cyrillic) into your Phone table passing them as non-unicode (without using N), only one of them will be inserted, and the other will be rejected by unique constraint. And if you'll try to select from Phone, '?????' will be returned

Related Solutions

Sql-server – Unique index corrupted SQL. Select query returns single row but create unique index fails

if you consider error, Msg 1505, Level 16, State 1, Line 2 The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'dbo.MSmerge_contents' and the index name 'uc1SycContents'. The duplicate key value is (7696031, 08703987-557d-e111-9888-e61f13c44f03)...... I am running below Query
select * from msmerge_contents where rowguid='08703987-557d-e111-9888-e61f13c44f03'
and it is returning only 1 row

When you have index corruption problems (ie. keys present in NC index but not in base table or vice-versa) you must be very careful about the SQL you use to validate data. At this moment your data is inconsistent but the query optimizer does not know that and completely trusts your schema, including these incorrect indexes. As such it may optimize your query to use one of the NC indexes that is missing a key and the result will also miss a a key falsely returning no duplicates. To solve this catch-22 situation you need to force the optimizer hand by explicitly requesting an index or another and make sure the projected list of columns can be satisfied by the index you enforced (ie. no *). Assuming uc1SycContents is not the clustered index, try out the following:

select rowguid
from msmerge_contents with INDEX (1)
where rowguid='08703987-557d-e111-9888-e61f13c44f03';

select rowguid
from msmerge_contents with INDEX ([uc1SycContents])
where rowguid='08703987-557d-e111-9888-e61f13c44f03';

This will forcefully check if the rowguid has a duplicate for that guid in the base table clustered index (index id 1) vs. the index uc1SycContents. I expect that the first query returns 2 (or more) rows while the second returns 1.

Sql-server – What Happens When Identity Range IsExceeded

If you attempt to bulk insert more rows then available identity values between syncs you will get an error:

The identity range managed by replication is full and must be updated by a replication agent.

The key here is to allocate range sizes large enough to accommodate for the amount of inserts that may occur between syncs. You can specify the range sizes in article properties when adding articles to the publication.

Best Answer

Related Solutions

Sql-server – Unique index corrupted SQL. Select query returns single row but create unique index fails

Sql-server – What Happens When Identity Range IsExceeded

Related Question