Sql-server – How to remove diacritics in computed persisted columns? COLLATE is non-deterministic and cannot be used

azure-sql-databasesql-server-2012t-sql

I have a function that includes:

SELECT @pString = CAST(@pString AS VARCHAR(255)) COLLATE SQL_Latin1_General_Cp1251_CS_AS

This is useful, for example, to remove accents in French; for example:

UPPER(CAST('Éléctricité' AS VARCHAR(255)) COLLATE SQL_Latin1_General_Cp1251_CS_AS)

gives ELECTRICITE.

But using COLLATE makes the function non-deterministic and therefore I cannot use it as a computed persisted value in a column.

Q1. Is there another (quick and easy) way to remove accents like this, with a deterministic function?

Q2. (Bonus Question) The reason I do this computed persisted column is 1)to search 2)to detect duplicates. For example the user may enter the customer's last name as either 'Gagne' or 'Gagné' or 'GAGNE' or 'GAGNÉ' and the app will find it using the persisted computed column (and the update/insert will balk at the duplicate). Is there a better way to do this?

EDIT: Using SQL Server 2012 and SQL-Azure.

Best Answer

Why not just create the table column with a case-insensitive, accent-insensitive collation? This prevents duplicates according to the collation rules, and allows the sort of searches you seem to require:

CREATE TABLE Test
(
    col1 varchar(30) COLLATE SQL_Latin1_General_CP1_CI_AI PRIMARY KEY
);

-- Success
INSERT dbo.Test VALUES ('Gagné');

-- Failed, duplicate key
INSERT dbo.Test VALUES ('Gagne');

-- Success
SELECT * 
FROM dbo.Test 
WHERE col1 = 'gaGNE';

As an important side note, I should mention that T-SQL scalar and multi-statement functions have very poor performance characteristics. Using them in a computed column definition is even worse. Avoid.

Related Solutions

SQL Server – How to Add a Unique Constraint Ignoring Existing Violations

The answer is "yes". You can do this with a filtered index (see here for documentation).

For instance, you can do:

create unique index t_col on t(col) where id > 1000;

This creates a unique index, only on new rows, rather than on the old rows. This particular formulation would allow duplicates with existing values.

If you have just a handful of duplicates, you could do something like:

create unique index t_col on t(col) where id not in (<list of ids for duplicate values here>);

Sql-server – Reuse a complex non-deterministic expression in SELECT and WHERE clauses

This can't help with the determinism issue, but CROSS APPLY is a good tool for big expressions that need to be reused:

SELECT TOP 10 team_name, CxA.Score_Points
FROM teams t
CROSS APPLY (SELECT Score_Points = <big Expression, refer to t>) CxA
 WHERE CxA.Score_Points >= 100
 ORDER BY CxA.Score_Points DESC

CROSS APPLY expressions get calculated for every row and can be referred to in any part of the query.

Best Answer

Related Solutions

SQL Server – How to Add a Unique Constraint Ignoring Existing Violations

Sql-server – Reuse a complex non-deterministic expression in SELECT and WHERE clauses

Related Question