A co-worker is trying to sync table B with the contents of table A, both tables have the same structure, we wrote 3 statements, one for update, one for delete and one for inserts.
I have a question when it comes to updating rows:
tables structure:
[id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[value1] [int] NULL,
[value2] [int] NULL,
We could have this:
update tb_B
set tb_B.value1 = tb_A.value1,tb_B.value2 = tb_A.value2
from tb_B inner join b_A on tb_B.id = tb_A.id
but each time we would be writing all the values again and again regardless if the values changed, so I thought about something like this:
update tb_B
set
tb_B.value1 = tb_A.value1,
tb_B.value2 = tb_A.value2
from tb_B
inner join tb_A on tb_B.id = tb_A.id
where
tb_B.value1 <> tb_A.value1 or
tb_B.value2 <> tb_A.value2
The last update checks if the value of any column has changed, if true, then it proceeds to update all of the values for that row.
Which do you think it's faster ? writing all the values again and again VS comparing the values and writing ONLY if at least one value is different ?
I think that the farther away the updated column is, the more SQL will take to compare because of the way the OR operator works, this is how I think it works:
think of the numbers as columns:
if ( 1 <> 1 OR 2 <> 2 OR 3 <> 3 OR 4 <> 5)
then UPDATE
the program will make 4 comparisons in order to realize the row needs to be updated, whereas in this case:
if ( 1 <> 2 OR 2 <> 2 OR 3 <> 3 OR 4 <> 5)
then UPDATE
the program will make 1 comparison and proceed with the update. The comparison in my opinion will vary tremendously because the number of comparisons vary for each row !
I hope I am on the right track here,
I appreciate your comments,
thanks
Best Answer
For a question like this, you really need to time the queries to check performance.
My guess is that the logging overhead on the update is going to be more expensive than any reasonable list of "or" statements. If you are going to use the comparisons, though, you should have a correct statement that takes NULLs into account:
You can make the query more efficient by defining an index on each table with (id, value1, value2). This will allow all the logic to use the index. A page would only be necessary in the page cache when a mismatch is found.