SQL Server – How to Remove Duplicate Records in a Table

duplicationsql server

I have a table table1 with 1 millions rows of data.

I want to remove all the duplicate records in the table1

I was looking at this link:

How do I remove duplicate records in a join table in PostgreSQL?

Can you tell me how to do this below query in SQL server?

 DELETE FROM questions_tags q
WHERE EXISTS (
   SELECT 1
   FROM   questions_tags q1
   WHERE  q1.ctid < q.ctid
   AND    q.question_id = q1.question_id
   AND    q.tag_id = q1.tag_id
    );

Best Answer

The syntax is just slightly different:

DELETE q
FROM dbo.questions_tags AS q
WHERE EXISTS 
(
   SELECT 1
   FROM   dbo.questions_tags AS q1
   WHERE  q1.ctid < q.ctid
   AND    q.question_id = q1.question_id
   AND    q.tag_id = q1.tag_id
);

Personally, I prefer to use a CTE. Then I can easily swap in a SELECT to validate what I am about to delete, and easily change the where clause to validate what I am going to keep:

;WITH q AS 
(
  SELECT question_id, tag_id, ctid,
    rn = ROW_NUMBER() OVER (PARTITION BY question_id, tag_id ORDER BY ctid)
  FROM dbo.question_tags
)
--DELETE q
SELECT * FROM q 
WHERE rn > 1; -- to show keepers, change to = 1 

I believe these semantics match yours, but please test.

Then, of course, add a proper key constraint before you let anybody insert any new nonsense into this table.