Is updating with value using a subquery bad in performance

performance

I have made an update query in the following form:

update table_a
set some_column = (
    select very_complicated_expression_involving_multiple_columns_in_table_b
    from table_b
    where table_a.id = table_b.id
)

If table_a is large, will the database engine loop the select query for each row in table_a? Should I rewrite this to an update from statement?

Best Answer

Assuming Sql Server as your DBMS:

The following two statements are NOT equivalent:

update table_a --will update some_column to null when no match to table_b on id
set some_column = (
    select very_complicated_expression_involving_multiple_columns_in_table_b
    from table_b
    where table_a.id = table_b.id
)

UPDATE a --will not update table_a rows that don't match table_b on id
SET a.some_column = very_complicated_expression_involving_multiple_columns_in_table_b
FROM table_a a
JOIN table_b b ON b.id = a.id

The first UPDATE will update some_column to NULL if there is no match between table_a and table_b on id.

The second UPDATE won't update any table_a rows where there is no match between table_a and table_b on id.

Having said that, and back to your original question about performance, I would think the optimizer would NOT choose any kind of LOOP JOIN when there are so many rows involved in the update (assuming your statistics are up to date). Evaluating different estimated execution plans for different flavors of the update will help you decide which is the most efficient.

There are other things that also need to be considered when updating a large table.

  • An update against a large table will generated a lot of transaction log entries. You should consider breaking up the update into smaller chunks to reduce the possibility of running out of log space. If you're running in FULL recovery mode, make sure you are taking log backups in a timely manner. You would need to determine a method of identifying rows that have already been updated in previous runs. Break large delete operations into chunks is a post about breaking up large deletes, but I think the idea can be applied to updates.
  • If some_column is referenced in any indexes on table_a, you should consider dropping/disabling them. Naturally, they will have to be rebuilt afterwards.
  • Check for any UPDATE triggers on table_a that might impact performance.