Sql-server – Duplicate rows with a few changes in a table with a composite primary key, but only if target does not exist

azure-sql-databaseif-not-existsinsertsql server

Consider the following table:

CREATE TABLE [xxx].[MyTable](
    [period] [date] NOT NULL,
    [genusId] [int] NOT NULL,
    [subjectId] [int] NOT NULL,
    [waitingStageId] [tinyint] NOT NULL,
    [value] [tinyint] NULL,
    [savedOn] [datetime2](2) NULL,
    [savedBy] [nvarchar](50) NULL,
 CONSTRAINT [PK_ExpectedActivity] PRIMARY KEY CLUSTERED 
(
    [period] ASC,
    [genusId] ASC,
    [subjectId] ASC,
    [waitingStageId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-05-01' AS Date), 71, 92, 1, 1, CAST(N'2017-05-29T13:45:20.6300000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-05-01' AS Date), 71, 96, 1, 1, CAST(N'2017-05-29T13:45:21.2400000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-05-01' AS Date), 71, 128, 0, 2, CAST(N'2017-05-29T13:45:21.6500000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-05-01' AS Date), 71, 128, 1, 3, CAST(N'2017-05-29T13:45:21.4400000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 73, 35, 0, 1, CAST(N'2017-05-26T20:05:40.2800000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 73, 94, 0, 3, CAST(N'2017-05-26T20:05:40.6700000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 73, 94, 1, 1, CAST(N'2017-05-26T20:05:40.4800000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 74, 23, 0, 6, CAST(N'2017-05-26T20:03:26.5400000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 74, 47, 0, 6, CAST(N'2017-05-26T20:03:26.7700000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 74, 67, 0, 3, CAST(N'2017-05-26T20:03:27.1800000' AS DateTime2), NULL)
GO

I am trying to create a stored procedure that will allow us to duplicate values from the previous month into the current month — BUT ONLY if there isn't a value for the current month already.

DROP TABLE IF EXISTS #PreviousMonth
DECLARE @schemaName AS varchar(5) = 'xxx' -- To be used later as a parameter in stored procedure
DECLARE @lastMonth AS DATE = CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE())-1, 0) AS DATE) --First day of previous month
DECLARE @currentMonth AS DATE = CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0) AS DATE) --First day of current month

-- Add rows for existing month into a temporary table
SELECT @currentMonth AS period, genusId, subjectId, waitingStageId, value, GETDATE() AS savedOn, '<Automated by Stored Procedure>' AS savedBy INTO #PreviousMonth FROM [xxx].[ExpectedActivity]
  WHERE period in(@lastMonth)

SELECT * INTO xxx.MyTable FROM #PreviousMonth
-- Here is where I got stuck. I tried WHERE caluses and NOT EXISTS but my skills are limited.

This is as far as I managed to get unfortunately. My skill is limited where it gets to do an INSERT with a condition with the NOT EXIST condition.

How can I make sure that I only insert rows in the xxx.MyTable where rows with the same primary key (period AND genusId AND subjectId AND waitingStageId) does not exist already?

Best Answer

Taking what your query says literally, you don't have to worry about what exists in xxx.myTable - because your statement will create it, therefore it won't have anything in it (or, if it already exists, your statement will fail). Presumably, that's not really what you were going for.

If we assume that xxx.myTable does exist, and may already have some rows whose values match your defaults, then this should work.

INSERT INTO xxx.myTable (period, genusID, subjectID, waitingStageID, value, savedOn, savedBy)
SELECT period, genusID, subjectID, waitingStageID, value, savedOn, savedBy
  FROM #PreviousMonth pm
 WHERE NOT EXISTS (SELECT 1
                     FROM xxx.myTable
                    WHERE period = pm.period
                      AND genusID = pm.genusID
                      AND subjectID = pm.subjectID
                      AND waitingStageID = pm.waitingStageID
                  )
;

In the NOT EXISTS clause, you're checking if the current row from the temp table matches the primary key values for any rows already in xxx.myTable. If you find a match for the current row, then you won't SELECT it (and thus, won't INSERT it).

You can also write this as what's called an "anti-join":

INSERT INTO xxx.myTable (period, genusID, subjectID, waitingStageID, value, savedOn, savedBy)
SELECT period, genusID, subjectID, waitingStageID, value, savedOn, savedBy
  FROM #PreviousMonth pm
         LEFT OUTER JOIN xxx.myTable mt ON (    pm.period = mt.period
                                            AND pm.genusID = mt.genusID
                                            AND pm.subjectID = mt.subjectID
                                            AND pm.waitingStageID = mt.waitingStageID
                                           )
 WHERE mt.period IS NULL
;

Here, we're matching all the rows in the temp table with any matching records in xxx.myTable. via a LEFT JOIN. Because of the left join, if there's no matching row in xxx.myTable, we'll still return columns from that table, but all the values will be NULL. So, if we find a row where period is NULL in xxx.myTable, that should mean that there isn't a matching xxx.myTable row for our temp table row. So, only those rows (temp table rows with no matching xxx.myTable row) will be SELECTed, and thus will be INSERTed.

In most cases, I wouldn't expect dramatically different performance between the two versions; test in your specific case, to see if one seems better than the other.

UPDATE: If you want to use the previous month's data anywhere the key fields don't exists, or where they do exist but the value = 0, you've got a couple of options available.

You could delete all rows in xxx.myTable where the value is 0, then do the insert as above.

I would be inclined to UPDATE the existing rows, and then perform the INSERT as above. The UPDATE statement would be:

UPDATE mt
   SET value = pm.value
  FROM xxx.myTable mt
         INNER JOIN #PreviousMonth pm ON (    mt.period = pm.period
                                          AND mt.genusID = pm.genusID
                                          AND mt.subjectID = pm.subjectID
                                          AND mt.waitingStageID = pm.waitingStageID
                                         )
 WHERE mt.value = 0
   AND pm.value <> 0
;