Sql-server – What’s faster than Mirroring tables, compare or write

sql-server-2005

A co-worker is trying to sync table B with the contents of table A, both tables have the same structure, we wrote 3 statements, one for update, one for delete and one for inserts.

I have a question when it comes to updating rows:

tables structure:

        [id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
        [value1] [int] NULL,
        [value2] [int] NULL,

We could have this:

update tb_B 
set tb_B.value1 = tb_A.value1,tb_B.value2 = tb_A.value2 
from tb_B inner join  b_A on tb_B.id = tb_A.id

but each time we would be writing all the values again and again regardless if the values changed, so I thought about something like this:

update tb_B
set 
    tb_B.value1 = tb_A.value1,
    tb_B.value2 = tb_A.value2
from tb_B
inner join tb_A on tb_B.id = tb_A.id
where 
    tb_B.value1 <> tb_A.value1 or
    tb_B.value2 <> tb_A.value2

The last update checks if the value of any column has changed, if true, then it proceeds to update all of the values for that row.

Which do you think it's faster ? writing all the values again and again VS comparing the values and writing ONLY if at least one value is different ?

I think that the farther away the updated column is, the more SQL will take to compare because of the way the OR operator works, this is how I think it works:

think of the numbers as columns:
if ( 1 <> 1 OR 2 <> 2 OR 3 <> 3 OR 4 <> 5)
then UPDATE

the program will make 4 comparisons in order to realize the row needs to be updated, whereas in this case:

if ( 1 <> 2 OR 2 <> 2 OR 3 <> 3 OR 4 <> 5)
then UPDATE

the program will make 1 comparison and proceed with the update. The comparison in my opinion will vary tremendously because the number of comparisons vary for each row !

I hope I am on the right track here,

I appreciate your comments,
thanks

Best Answer

For a question like this, you really need to time the queries to check performance.

My guess is that the logging overhead on the update is going to be more expensive than any reasonable list of "or" statements. If you are going to use the comparisons, though, you should have a correct statement that takes NULLs into account:

where (tb_B.value1 <> tb_A.value1) or
      (tb_b.value1 is null and tb_a.value1 is not null) or (tb_b.value1 is not null and tb_a.value1 is null) or
      tb_B.value2 <> tb_A.value2 or
      (tb_b.value2 is null and tb_a.value2 is not null) or (tb_b.value2 is not null and tb_a.value2 is null)

You can make the query more efficient by defining an index on each table with (id, value1, value2). This will allow all the logic to use the index. A page would only be necessary in the page cache when a mismatch is found.