Sql-server – Treating certain Arabic characters as identical

collationsql server

In Arabic we have characters like ا (alef) and أ (alef with hamza).

Users write them interchangeably and we want to search them interchangeably. SQL Server treats them as separate characters. How can I make SQL treat them as the same character?

I thought to replace any أ (alef with hamza) with ا (alef) at insertion but we have a lot of alternatives in Arabic language not just ا (alef) and أ (alef with hamza).

I tried Arabic_CI_AS and Arabic_CI_AI but that doesn't solve the problem.

Here is a script to regenerate the issue:

CREATE TABLE [dbo].[TestTable] (
    [ArabicChars] [nvarchar](50) NOT NULL,

    CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED 
    (
       [ArabicChars] ASC
    )
) ON [PRIMARY];


INSERT INTO TestTable values (N'احمد');
INSERT INTO TestTable values (N'أحمد');

SELECT * 
FROM TestTable 
WHERE ArabicChars like N'ا%';

The result is:

ArabicChars 

احمد

(1 row(s) affected)

The desired result would be both of the rows we inserted.

Best Answer

i did few tests and i guess it is a work around but can get your job done, since SQL it self isn't helping much.

if you notice that the unicodes of these characters are close to each other

select unicode(N'أ')
  = 1571

select unicode(N'ا')
  = 1575

select unicode(N'إ')
  = 1573

so between أ and ا , its from 1571 to 1575 or if you want to make sure you get every thing in between

make sure you include from 1569 to 1575

which are

Select NCHAR(1569) = ء
Select NCHAR(1570) = آ
Select NCHAR(1571) = أ
Select NCHAR(1572) = ؤ
Select NCHAR(1573) = إ
Select NCHAR(1574) = ئ 
Select NCHAR(1575) = ا

So to make sure that you include every thing similar in your search you can use regular expressions

SELECT * 
FROM TestTable 
WHERE ArabicChars like '%[ء-ا]%'

so in this case you get all characters between ء and ا which include all those between 1569 to 1575

so in this case if your table has

 CREATE TABLE [dbo].[TestTable]  (
    [ArabicChars] [nvarchar](50) COLLATE Arabic_CI_AI NOT NULL,
) 
INSERT INTO TestTable values (N'احمد');
INSERT INTO TestTable values (N'أحمد');
INSERT INTO TestTable values (N'إحمد');

the query above will get them all.

but you will notice something funny

if you have your column as a primary key

CREATE TABLE [dbo].[TestTable]  (
    [ArabicChars] [nvarchar](50) COLLATE Arabic_CI_AI NOT NULL,

    CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED 
    (
       [ArabicChars] ASC
    )
) ON [PRIMARY];

you wont be able to insert these 2 records

INSERT INTO TestTable values (N'أحمد');
INSERT INTO TestTable values (N'إحمد');
INSERT INTO TestTable values (N'ءحمد');

because the ء,أ,إ are all to SQL are part of hamza which is ء

So if you run the query

SELECT * 
FROM TestTable 
WHERE ArabicChars like 'ء%'

it will show you

أحمد
إحمد

so to get the long story short

to SQL أ is not = to ا because its 2 different letters hamza and alefp

but ء = آ = أ = ؤ = إ = ئ

they are all Hamza ء