Sql-server – Performance Inserting and Updating Millions of rows into a table

performancesql serversql-server-2008

We have a CSV the user imports into the Application.

After inspecting the CSV, validating and applying some business logic, we got Inserts and Updates operations we should perform into a table(Subscribers)

We got around 1 millions of inserts and 1 million of updates operations.

Which will be the most effective way to execute them?

1) For the inserts we planned to put them in a CSV file and execute them as a BULK INSERT.

2) What should we do for the updates? Is there something similar to BULK INSERT?

3) Is there any other optimization we should look?

3a) Like for example disabling the indexes while doing this? We should consider the DB should be kept available as normal for the other users of the application…

3b) Suggestions ????

Best Answer

If you are bringing data from a CSV with a large import process, I'd recommend you utilize a staging table. This will allow you to dump all of your imported data into the staging table that won't be queried by end-users/applications. You can use this staging table as a source for your INSERT and UPDATE operations.

Once the data is in the staging table you can kick off a stored procedure (or multiple stored procedures, depending on how vast your business logic is) to do the necessary INSERT and UPDATE DML statements into the actual queried table.

As for performance, you need to decide what is more important for your implementation. If you're looking to get the data in the table as quickly as possible, a very feasible route may be to disable the indexes. But realize that OLAP performance will be impacted by that. If you want to ensure the minimal amount of end-user impact then you would have to approach this a different way. When you say "the DB should be kept availabel as normal" then I would assume the latter.

Performance is double-edged in this situation. Performance for the data import will most likely take away from performance of end-user interaction with the data, and vice versa.