I have a database that contains some erroneous results that are essentially but not technically duplicates. The structure is like this:
id_page (PK), id_site, label, create_date
There should only be one entry for each site for each label but it turns out that I have some duplicates for site
and label
, where id_page
and create_date
are different. I would like to remove the rows with the lowest create_date
.
I think I have come up with a solution for this but I'd appreciate feedback.
WITH duplicates as (
SELECT id_page, id_site, count(id_site) over (partition by id_site) as ct,
min("create_date") over (partition by id_site) as dt
from pages
where label = '2018-12-15'
)
DELETE from pages
where id_page in (
select p.id_page
from duplicates as d
join pages as p on (p.id_page = d.id_page
and p.create_date = d.dt)
WHERE ct = 2
);
Best Answer