SQL Server – How to Remove Duplicate Records in a Table

duplicationsql server

I have a table table1 with 1 millions rows of data.

I want to remove all the duplicate records in the table1

I was looking at this link:

How do I remove duplicate records in a join table in PostgreSQL?

Can you tell me how to do this below query in SQL server?

 DELETE FROM questions_tags q
WHERE EXISTS (
   SELECT 1
   FROM   questions_tags q1
   WHERE  q1.ctid < q.ctid
   AND    q.question_id = q1.question_id
   AND    q.tag_id = q1.tag_id
    );

Best Answer

The syntax is just slightly different:

DELETE q
FROM dbo.questions_tags AS q
WHERE EXISTS 
(
   SELECT 1
   FROM   dbo.questions_tags AS q1
   WHERE  q1.ctid < q.ctid
   AND    q.question_id = q1.question_id
   AND    q.tag_id = q1.tag_id
);

Personally, I prefer to use a CTE. Then I can easily swap in a SELECT to validate what I am about to delete, and easily change the where clause to validate what I am going to keep:

;WITH q AS 
(
  SELECT question_id, tag_id, ctid,
    rn = ROW_NUMBER() OVER (PARTITION BY question_id, tag_id ORDER BY ctid)
  FROM dbo.question_tags
)
--DELETE q
SELECT * FROM q 
WHERE rn > 1; -- to show keepers, change to = 1

I believe these semantics match yours, but please test.

Then, of course, add a proper key constraint before you let anybody insert any new nonsense into this table.

Related Solutions

Mysql – remove duplicate rows in thesql table that does not contain primary key

In the spirit of @yercube's answer, I have an answer that has an added twist.

CREATE TABLE stage
(
    id int not null auto_increment,
    name varchar(20),
    primary key (id)
);
CREATE TABLE stage2 LIKE stage;
INSERT INTO stage (name) SELECT name FROM item;
INSERT INTO stage2 (id) SELECT min_id FROM
(SELECT MIN(id) min_id,name FROM stage GROUP BY name) A;
UPDATE stage2 A INNER JOIN stage B USING (id) SET A.name=B.name;
TRUNCATE TABLE item;
INSERT INTO item (name) SELECT name FROM stage2;
DROP TABLE stage;
DROP TABLE stage2;

This will load stage2 with the first occurrence of each name from item, zap the item table, and load the unique occurrences back.

If you look back in @yercube's answer and compare it to my answer, his is much more simplistic because

@yercube uses one temp table, while I use two
I had to create a column for iteration control, @yercube did not need to
@yercube has fewer steps
both answers achieve the same thing

I do not expect my answer to be accepted. The sole purpose of my answer was demonstrate that other answers lose the concise clarity needed to solve your problem. Again, hats off to @yercube.

DB2 – Remove All Records with Duplicates

Assuming your desired uniqueness constraint is (one,two), here is one way to do it:

DELETE FROM session.test WHERE (one, two) IN ( SELECT one, two FROM session.test GROUP BY one, two HAVING COUNT(*) > 1 )

Best Answer

Related Solutions

Mysql – remove duplicate rows in thesql table that does not contain primary key

DB2 – Remove All Records with Duplicates

Related Question