I have made an update query in the following form:
update table_a
set some_column = (
select very_complicated_expression_involving_multiple_columns_in_table_b
from table_b
where table_a.id = table_b.id
)
If table_a
is large, will the database engine loop the select query for each row in table_a
? Should I rewrite this to an update from
statement?
Best Answer
Assuming Sql Server as your DBMS:
The following two statements are NOT equivalent:
The first
UPDATE
will update some_column to NULL if there is no match between table_a and table_b on id.The second
UPDATE
won't update any table_a rows where there is no match between table_a and table_b on id.Having said that, and back to your original question about performance, I would think the optimizer would NOT choose any kind of
LOOP JOIN
when there are so many rows involved in the update (assuming your statistics are up to date). Evaluating different estimated execution plans for different flavors of the update will help you decide which is the most efficient.There are other things that also need to be considered when updating a large table.
FULL
recovery mode, make sure you are taking log backups in a timely manner. You would need to determine a method of identifying rows that have already been updated in previous runs. Break large delete operations into chunks is a post about breaking up large deletes, but I think the idea can be applied to updates.some_column
is referenced in any indexes on table_a, you should consider dropping/disabling them. Naturally, they will have to be rebuilt afterwards.UPDATE
triggers on table_a that might impact performance.