T-sql – Сopy rows from one table to another and replace duplicates in the process

azure-sql-databasestored-procedurest-sql

I have a SSIS job that loads daily data from CSV files into a raw staging table in a SQL Server database (technically is an Azure SQL database). Then after that, a stored procedure is executed to copy all rows from the raw table into a production table, whilst performing the necessary castings, handling of dates, etc. Here is a simplified example of the stored procedure code:

USE [example_database]
GO

/****** Object:  StoredProcedure [dbo].[Append_Data]    Script Date: 19/07/2016 11:21:34 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

create procedure [dbo].[Append_Data] as
--Code to delete updated records 

--Code to perform the append of incremental records 
insert into dbo.[Admissions]
SELECT [refno]
      ,[name]
      ,[value]
      ,cast([modified_date] as datetime)
  FROM [RAW].[Admissions]

GO

I need some help to amend this code so it can handle duplicates as follows: Based on the column that uniquely identifies each row (named refno — a candidate key), I need to replace any existing records with the same refno in the production key, as long as the modified date column (named date_modified — a datetime column) is bigger (newer) than what is already in the production table.

Here is an example:

PRODUCTION TABLE

refno, name, value, date_modified
1, Alex, 2112.34, 08/Jul/2016 15:37:00 
2, Geddy, 674.91, 08/Jul/2016 15:37:00
3, Neil, 914.73, 08/Jul/2016 15:37:00

RAW TABLE

refno, name, value, date_modified
1, Alex, 1001.93, 10/Jul/2016 12:27:00 
2, Geddy, 674.91, 08/Jul/2016 15:37:00
3, Neil, 425.85, 01/Jul/2016 05:13:00
4, Hugh, 9807, 01/Jan/2016 13:42:00
  • The first row should get replaced in the production table because a row with the same refno exists in the raw table, and it has a newer modified date.
  • The second row has not been modified, so we leave it as is. Even though the matching row does exist in the target production table, we can see that the date_modified is the same so we ignore it.
  • The third row has been modified, but the modified date is earlier than what we have in the production table, so we ignore it.
  • The fourth row in the staging table is new (i.e. a row with that refno does not exist in the production table yet) so we copy it over.

Does that make sense? We don't tend to get many rows per day (about 1000 tops) and it is a straightforward, denormalised table (hence the stored procedure rather than going nuts in SSIS 🙂 ).

Any help is greatly appreciated.

Best Answer

merge production as target  
using raw as source  
on target.refno = source.refno
when matched and souce.date > target.date 
     then update set target.date = souce.date, target.value= souce.value 
when not matched  
     then insert ...

merge