Postgresql – Delete duplicate records and update the mapping table

deletepostgresqlupdate

I have store_table_map and item_price_map. I want to delete duplicate records in store_table_map and retain the latest row based on updated_date because i want to put constraints in store_id and item_id. But I also want first to update the item_price_map as seen below before deleting old duplicate records.

Current Table

enter image description here

Expected Result

enter image description here

Here's the script

CREATE TABLE store_table_map (
    map_id bigserial NOT NULL,
    store_id int8 NOT NULL,
    item_id int8 NOT NULL,
    updated_date date NOT NULL DEFAULT now()
);

INSERT INTO store_table_map (store_id,item_id,updated_date) VALUES 
(1,123,'2020-07-09')
,(1,123,'2020-07-10')
,(2,456,'2020-07-08')
,(2,456,'2020-07-10')
;

CREATE TABLE item_price_table_map (
    map_id bigserial NOT NULL,
    store_table_map_id int8 NOT NULL,
    price float8 NOT NULL,
    updated_date date NOT NULL DEFAULT now()
);

INSERT INTO item_price_table_map (store_table_map_id,price,updated_date) VALUES 
(1,99.0,'2020-07-09')
,(2,199.0,'2020-07-10')
,(3,299.0,'2020-07-08')
,(4,399.0,'2020-07-10')
;

postgres version: PostgreSQL 9.2.24

Best Answer

You first need to create the mapping from the old map_id to the new one, this can be done using distinct on() and a self join:

select m1.map_id old_map_id, m2.new_map_id
from store_table_map m1
  join (
    select distinct on (store_id, item_id) store_id, item_id, map_id as new_map_id
    from store_table_map 
    order by store_id, item_id, updated_date desc
  ) m2 on (m1.store_id, m1.item_id) = (m2.store_id, m2.item_id)

The above returns:

old_map_id | new_map_id
-----------+-----------
         1 |          2
         2 |          2
         3 |          4
         4 |          4

This can be used in an UPDATE statement:

update item_price_table_map ip
  set store_table_map_id = t.new_map_id
from (  
  select m1.map_id old_map_id, m2.new_map_id
  from store_table_map m1
    join (
      select distinct on (store_id, item_id) store_id, item_id, map_id as new_map_id
      from store_table_map 
      order by store_id, item_id, updated_date desc
    ) m2 on (m1.store_id, m1.item_id) = (m2.store_id, m2.item_id)
) t 
where t.old_map_id = ip.store_table_map_id
  and ip.store_table_map_id <> t.new_map_id;

After that you can remove the rows with the now unused map_ids

delete from store_table_map stm
where not exists (select *
                  from item_price_table_map ip
                  where ip.store_table_map_id = stm.map_id);

(I could only test this on Postgres 12, but I think it should work on 9.2 too)