SQL Server – Automating Foreign Keys Update

duplicationsql serversql server 2014

I'm on SQL Server 2014 and I need to clean a table from duplicates. Problem is: if a duplicate has references in other tables, that references should be updated to reference the chosen non-duplicate entity.

E.g., say a have a Person table:

+------+-------+-------+------------------------+
|  id  | fname | lname | mail                   | 
+------+-------+-------+------------------------+
| 1111 | John  | Smith | john.smith@example.com |
| 2222 | J.    | Smith | john.smith@example.com |
| 3333 | john  | smith | john.smith@example.com |
| 4444 | Smith | John  | john.smith@example.com |
+------+-------+-------+------------------------+

I need to delete 2222, 3333, 4444 and replace references around the database to these with a reference to 1111. After this operation, a unique index on mail will be created.

My approach would be gathering all the distinct mail and building a id => [ ids ] map (e.g. 1111 => [ 2222, 3333, 4444]) and with a scripting language like Perl or PHP update all the tables which may have references to the duplicates and setting the correct id.

Since there are thousands of users and hundreds of tables with relations to them, I wonder if operations like this could be done directly in SQL Server function, with something like:

DELETE FROM [Person] WHERE [id] IN (2222, 3333, 4444) UPDATE REFERENCES WITH 1111

Best Answer

I think this will do it
I changed this answer as I may have read the question incorrectly
Look at edit history if you are looking for something different

as for the fk

with CTE as 
(  select id, mail 
        , row_number() over (partition by mail order by id) as rn
   from table 
) 
update fk1  
   set fk1.fkID = cte1.ID 
  from fk as fk1 
  join cte as CTE2 
    on CTE2.ID = fk1.fkID
   and CTE2.rn > 1 
  join cte as CTE1 
    on CTE1.mail = CTE2.mail 
   and CTE1.rn = 1;

run above for addition fk

delete *  
from  
(  select id, mail 
        , row_number() over (partition by mail order by id) as rn
   from table 
) as t2 
where rn > 1;