Sql-server – BULK INSERT continue on PRIMARY KEY error

bulk-inserterror handlingsql servert-sql

I get a CSV file with a several million records every few days. I need to insert them into our database. What I do is quite straightforward:

BULK INSERT dbo.activity
FROM 'C:\tmp\activity.csv'
WITH (
    FIRSTROW = 2,
    FIELDTERMINATOR = '|',
    ROWTERMINATOR = '0x0a',
    BATCHSIZE = 1000,
    MAXERRORS = 2
)
GO

This works well. However, it may be the case that I get some duplicated records. When I run the import I, as expected, get the error:

Violation of PRIMARY KEY constraint 'PK__id__790FF7A****'. Cannot insert duplicate key in object 'dbo.activity'. The duplicate key value is (1234)

This gets quite cumbersome when instead of one duplicated record I have hundreds. Is there any way to tell BULK INSERT to ignore (or log) those errors and carry on with the good records? I am thinking about something like:

BEGIN TRY:
  BULK INSERT dbo.activity
  FROM 'C:\tmp\activity.csv'
  WITH (options....)
END TRY
BEGIN CATCH:
  SELECT
        ERROR_NUMBER() AS ErrorNumber  
       ,ERROR_MESSAGE() AS ErrorMessage;
   -- I know this line wrong.  It is just what I would like to do
  CONTINUE;
END CATCH

Thanks!

Best Answer

I think a good approch would be to have a staging table (without PK constraint).

You will load your data from your bulk command into this table and then, you should be able to script down the logic you want to implement into a store procedure that you can run after

Ex: Check for duplicate, log them to another table or send a mail, delete the duplicate base on some logic and then insert the remaining rows into the target table.