SQL Server 2012 – How to Remove Duplicates from Different Columns

sql serversql-server-2012

I'm trying to remove duplicates from a set, but the duplicates are in different columns, so for example with this table:

ColA, ColB, ColC, ColD
----------
1,  1, 'ABC', 'DEF'
----------
1,  1, 'DEF', 'ABC'
----------
1,  1, 'GHJ', 'LKJ'
----------
1,  1, 'LKJ', 'GHJ'

What I need to end up with is:

ColA, ColB, ColC, ColD
----------
1,  1, 'ABC', 'DEF'
1,  1, 'GHJ', 'LKJ'

Hope that makes sense, does anyone have any ideas?

This is SQL-Server 2012.

Best Answer

This probably ain't going to be right answer, but this works for data you gave us.

;WITH TestData (ColA, ColB, ColC, ColD)
AS (
    SELECT 1, 1, 'ABC', 'DEF'
    UNION ALL
    SELECT 1, 1, 'DEF', 'ABC'
    UNION ALL
    SELECT 1, 1, 'GHJ', 'LKJ'
    UNION ALL
    SELECT 1, 1, 'LKJ', 'GHJ'
    UNION ALL
    SELECT 1, 1, 'ABC', 'HJK'
    UNION ALL
    SELECT 1, 1, 'HJK', 'ABC'
)
SELECT ColA, ColB, ColC, ColD
FROM TestData
EXCEPT
SELECT ColA, ColB, ColD, ColC
FROM TestData
WHERE ColC < ColD

Related Solutions

How to Remove Specific Duplicates in SQL Server (All But Latest)

You could implement a query using row_number() to delete everything but the most recent row. This partitions the data by the employee_id and orders it by the autoId column, then you delete everything that is greater than the first row number:

;with cte as
(
  select [EMPLOYEE_ID], [ATTENDANCE_DATE], [AUTOID],
    row_number() over(partition by [EMPLOYEE_ID], [ATTENDANCE_DATE] 
                      order by  [AUTOID] desc) rn
  from dbo.ATTENDANCE
)
delete 
from cte 
where rn > 1;

See SQL Fiddle with Demo

Sql-server – How to remove diacritics in computed persisted columns? COLLATE is non-deterministic and cannot be used

Why not just create the table column with a case-insensitive, accent-insensitive collation? This prevents duplicates according to the collation rules, and allows the sort of searches you seem to require:

CREATE TABLE Test
(
    col1 varchar(30) COLLATE SQL_Latin1_General_CP1_CI_AI PRIMARY KEY
);

-- Success
INSERT dbo.Test VALUES ('Gagné');

-- Failed, duplicate key
INSERT dbo.Test VALUES ('Gagne');

-- Success
SELECT * 
FROM dbo.Test 
WHERE col1 = 'gaGNE';

As an important side note, I should mention that T-SQL scalar and multi-statement functions have very poor performance characteristics. Using them in a computed column definition is even worse. Avoid.

Best Answer

Related Solutions

How to Remove Specific Duplicates in SQL Server (All But Latest)

Sql-server – How to remove diacritics in computed persisted columns? COLLATE is non-deterministic and cannot be used

Related Question