Sql-server – What’s better for large changes to a table: DELETE and INSERT every time or UPDATE existing

application-designfragmentationperformancesql serversql-server-2005

I am making a project where I need to change around 36K records in one table daily. I'm wondering what will perform better:

  1. delete rows and insert new ones, or
  2. update already existing rows

For me it is easier to just delete all the rows and insert new ones, but if this is going to fragment the table and indexes and impact performance then I would prefer to make updates where possible and delete/insert only when necessary.

This is going to be a nightly service and I am not looking to improve the speed of the process itself. I am more concerned about the performance of queries against this table in general where I already have 89 million records and how this nightly process will affect it.

Should I delete/insert records or should I update existing ones (where possible) for this nightly process?

Best Answer

It really depends on how much of the data is changing. Lets say this table has 20 columns. And you also have 5 indexes - each on a diff. column.

Now if the values in all 20 columns are changing OR even if data in 5 columns are changing and these 5 columns are all indexed, then you may be better off "deleting and inserting". But if only 2 columns are changing and lets say these are not part of any non-clustered indexes, then you may be better off "Updating" the records because in this case only the clustered index will be updated (and indexes will not have to be updated).


On further research, I did find that the above comment by me is sort of redundant as SQL Server internally has 2 separate mechanism for performing an UPDATE. - An "in-place update" (ie by changing a columns value to a new in the original row) or as a "not-in-place UPDATE" (DELETE followed by an INSERT).

In place updates are the rule and are performed if possible. Here the rows stay exactly at the same location on the same page in the same extent. Only the bytes affected are chnaged. The tlog only has one record (provided there are no update triggers). Updates happen in place if a heap is being updated (and there is enough space on the page). Updates also happen in place if the clustering key changes but the row does not need to move at all.

For eg: if you have a clustered index on last name and you have the names: Able, Baker, Charlie Now you want to update Baker to Becker. No rows have to be moved. So this can take in-place. Whereas, if you have to update Able to Kumar, the rows will have to be shifted (even though they will be on the same page). In this case, SQL Server will do a DELETE followed by an INSERT.

Considering the above, I would suggest that you do a normal UPDATE and let SQL Server figure out the best way to how to do it internally.

For more details on "UPDATE" internals or for that matter any SQL Server related internals, check out Kalen Delaney, Paul Randal's, et al.'s book - SQL Server 2008 Internals.