SQL Server – Fixing StartsWith Query Using LIKE on Non-ASCII Column

collationlikesql server

I want to query a table for entries that begin with a specific fragment of text.

I am under the impression that the LIKE operator is a way to do this, so if I wanted to find all rows that have a Column2 that starts with "a" then I would do that with a query such as the following

SELECT Column1, Column2, Column3
FROM Table1
WHERE Column2 LIKE 'a%';

I have found that using a query like this returns unexpected results.

Here is an example:

CREATE DATABASE TestDB;
GO

USE TestDB;
GO

CREATE TABLE TestTable
(
    Id INT NOT NULL,
    Romanian NVARCHAR(MAX) COLLATE SQL_Romanian_CP1250_CI_AS NOT NULL,
    English  NVARCHAR(MAX) NOT NULL
);
GO

SET NOCOUNT ON;

INSERT TestTable SELECT 1, N'da', 'yes';
INSERT TestTable SELECT 2, N'ani', 'years';
INSERT TestTable SELECT 3, N'a avea', 'to have';
INSERT TestTable SELECT 4, N'a fi', 'to be';
INSERT TestTable SELECT 5, N'șase', 'six';
INSERT TestTable SELECT 6, N'șapte', 'seven';
INSERT TestTable SELECT 7, N'opt', 'eight';  
INSERT TestTable SELECT 8, N'zece', 'ten';
INSERT TestTable SELECT 9, N'nu', 'no';
GO

SELECT *
FROM TestTable
WHERE Romanian LIKE 'a%';

USE master;
GO

DROP DATABASE TestDb;
GO

I was expecting this to return the following

+----+-------------+----------+
| Id | Romanian    | English  |
+----+-------------+----------+
| 2  | ani         | years    |
+----+-------------+----------+
| 3  | a avea      | to have  |
+----+-------------+----------+
| 4  | a fi        | to be    |
+----+-------------+----------+

but it returned this instead

+----+-------------+----------+
| Id | Romanian    | English  |
+----+-------------+----------+
| 2  | ani         | years    |
+----+-------------+----------+
| 3  | a avea      | to have  |
+----+-------------+----------+
| 4  | a fi        | to be    |
+----+-------------+----------+
| 5  | șase        | six      |
+----+-------------+----------+
| 6  | șapte       | seven    |
+----+-------------+----------+

The version number I have is 13.0.1601.5

Best Answer

SQL Collations are provided only for backward compatibility.

This has been the case for over 10 years.

Use a more modern collation such as Romanian_100_CI_AS.

This does not produce that behaviour.