Sql-server – Problem with pattern matching on unicode characters

pattern matchingsql serversql server 2014unicode

I'm writing a piece of code that uses pattern matching on unicode characters but I'm running into an odd problem. Some characters work and some don't.

DECLARE @Pattern nvarchar(50) = N'%[^a-z]%' -- Simple pattern
SELECT PatIndex(@Pattern, nchar(46797)), nchar(46797) -- Works
SELECT PatIndex(@Pattern, nchar(14843)), nchar(14843) -- Doesn't Work

The pattern should be pulling for any character that isn't a-z but for some reason it isn't catching some characters. Does anyone know why some unicode characters would match and others wouldn't?

Best Answer

See if doing a binary collate fits what you need. Here is a quick test.

USE Tempdb  
GO

IF OBJECT_ID('PattMatch') IS NOT NULL  
BEGIN  
  DROP TABLE PattMatch  
END  
GO  

CREATE TABLE PattMatch (COL1 NVARCHAR(50))  
GO  

INSERT INTO PattMatch  
VALUES (nchar(46797)),(nchar(14843)),('ddddddd*'),('lettersand9999')  
GO  

DECLARE @Pattern nvarchar(50) = N'%[^a-z]%'   
SELECT PatIndex(@Pattern, COL1 COLLATE Latin1_General_BIN2), COL1 FROM PattMatch  
GO  

DROP TABLE PattMatch   
GO  

--your test  
DECLARE @Pattern nvarchar(50) = N'%[^a-z]%'  
SELECT PatIndex(@Pattern, nchar(46797) COLLATE Latin1_General_BIN2)  
SELECT PatIndex(@Pattern, nchar(14843) COLLATE Latin1_General_BIN2)