How to remove these duplicate records in an Oracle Database 12c

duplicationoracle

I have these two tables:

messages(id primary key, message_date, created_at, ...)
user_messages(user_id, message_id references messages(id))

I have some duplicate rows in messages:

select user_id, message_date, count(*)
from messages inner join user_messages 
on messages.id = user_messages_message_id
group by user_id, message_date;

user_id | message_date | count(*)
1       | 2019-01-01   | 2
1       | 2019-02-01   | 3
1       | 2019-03-01   | 2

How can I remove such duplicates, only retaining one of them, for example the one that its created_at (not message_date) is the minimum?

Best Answer

A lesser-known feature of Oracle databases- every row has a [hidden] column, called ROWID.

These meaningless,, character values can be used to isolate duplicates like this and get rid of them.

This query should get you candidates rows to be deleted:

select user_id, message_date, max(ROWID)
from messages m 
inner join user_messages um 
      on   m.id = um.message_id
group by user_id, message_date 
having count( * )  > 1 
order by 1, 2 ; 

user_id | message_date | rowid 
      1 | 2019-01-01   | AB12CD34EF...56
      1 | 2019-02-01   | AB12CD34EF...78
      1 | 2019-03-01   | AB12CD34EF...89

You can then delete rows using the ROWID directly.

delete from user_messages
where rowid = 'AB12CD34EF...56' ;

All that said, do not be tempted to use them for anything else!!
ROWID's are potentially volatile and so can change on you over time.

Related Solutions

Mysql – Need to find duplicate entries

Suppose your table is called ingredients. Try the following:

Step 01) Create an empty delete keys table called ingredients_delete_keys

CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;

Step 02) Create PRIMARY KEY on ingredients_delete_keys

ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);

Step 03) Index the ingredients table with fk,recipe,pkey

ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);

Step 04) Populate the ingredients_delete_keys table

INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;

Step 05) Perform a DELETE JOIN on ingredients table using keys that don't match

DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;

Step 06) Drop the delete keys

DROP TABLE ingredients_delete_keys;

Step 07) Get rid of the fk_recipe_pkey_ndx index

ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;

OK Here are all the lines in one block...

CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;
ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);
ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);
INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;
DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;
DROP TABLE ingredients_delete_keys;
ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;

Give it a Try !!!

CAVEAT

Notice that using MIN function helps keep the first pkey entered for fk. If you switch it to MAX function instead, the last pkey entered for fk is kept.

Mysql – remove duplicate rows in thesql table that does not contain primary key

In the spirit of @yercube's answer, I have an answer that has an added twist.

CREATE TABLE stage
(
    id int not null auto_increment,
    name varchar(20),
    primary key (id)
);
CREATE TABLE stage2 LIKE stage;
INSERT INTO stage (name) SELECT name FROM item;
INSERT INTO stage2 (id) SELECT min_id FROM
(SELECT MIN(id) min_id,name FROM stage GROUP BY name) A;
UPDATE stage2 A INNER JOIN stage B USING (id) SET A.name=B.name;
TRUNCATE TABLE item;
INSERT INTO item (name) SELECT name FROM stage2;
DROP TABLE stage;
DROP TABLE stage2;

This will load stage2 with the first occurrence of each name from item, zap the item table, and load the unique occurrences back.

If you look back in @yercube's answer and compare it to my answer, his is much more simplistic because

@yercube uses one temp table, while I use two
I had to create a column for iteration control, @yercube did not need to
@yercube has fewer steps
both answers achieve the same thing

I do not expect my answer to be accepted. The sole purpose of my answer was demonstrate that other answers lose the concise clarity needed to solve your problem. Again, hats off to @yercube.

Best Answer

Related Solutions

Mysql – Need to find duplicate entries

Mysql – remove duplicate rows in thesql table that does not contain primary key

Related Question