Is updating with value using a subquery bad in performance

performance

I have made an update query in the following form:

update table_a
set some_column = (
    select very_complicated_expression_involving_multiple_columns_in_table_b
    from table_b
    where table_a.id = table_b.id
)

If table_a is large, will the database engine loop the select query for each row in table_a? Should I rewrite this to an update from statement?

Best Answer

Assuming Sql Server as your DBMS:

The following two statements are NOT equivalent:

update table_a --will update some_column to null when no match to table_b on id
set some_column = (
    select very_complicated_expression_involving_multiple_columns_in_table_b
    from table_b
    where table_a.id = table_b.id
)

UPDATE a --will not update table_a rows that don't match table_b on id
SET a.some_column = very_complicated_expression_involving_multiple_columns_in_table_b
FROM table_a a
JOIN table_b b ON b.id = a.id

The first UPDATE will update some_column to NULL if there is no match between table_a and table_b on id.

The second UPDATE won't update any table_a rows where there is no match between table_a and table_b on id.

Having said that, and back to your original question about performance, I would think the optimizer would NOT choose any kind of LOOP JOIN when there are so many rows involved in the update (assuming your statistics are up to date). Evaluating different estimated execution plans for different flavors of the update will help you decide which is the most efficient.

There are other things that also need to be considered when updating a large table.

An update against a large table will generated a lot of transaction log entries. You should consider breaking up the update into smaller chunks to reduce the possibility of running out of log space. If you're running in FULL recovery mode, make sure you are taking log backups in a timely manner. You would need to determine a method of identifying rows that have already been updated in previous runs. Break large delete operations into chunks is a post about breaking up large deletes, but I think the idea can be applied to updates.
If some_column is referenced in any indexes on table_a, you should consider dropping/disabling them. Naturally, they will have to be rebuilt afterwards.
Check for any UPDATE triggers on table_a that might impact performance.

Related Solutions

Mysql – Improve `Update` performance (rows locking issue)

First, each time you UPDATE the status column, you are having to update the index as well (source). Evaluate your indexing to see if you really need the index on the status column. My guess is no, since it has an extremely low cardinality and MySQL probably won't use it anyway.

If you ignore me and think you do need it, follow the advice in the article to drop the index before your loop and re-add it after you're done.

Here are some other things you might do if that doesn't help:

You are taking all the columns from the data but only using number. Don't do a SELECT *, but instead a SELECT number. That won't help your writes, but it is a good performance practice. Only select the columns you're using.
Your number index isn't getting used at all. This means it is not unique enough to be useful for updating. (Slight tangent: how many rows does a single UPDATE affect?) I would drop it, or at least add it to process index.
It looks like process is unique enough for MySQL to whittle the amount of rows down to 16k, instead of 1 million. In light of this, I would add AND process=x to your update statement (I'm assuming you know process from the original SELECT statement):
```
-- FAILED--
UPDATE data SET status = 2, error='$error' WHERE process=X AND number = $data['number']

-- SUCCESS --
UPDATE data SET status = 1 WHERE process=X AND number = $data['number']
```

A hint about unnecessary indexes in InnoDB. InnoDB is using a hidden 'primary key' (since you don't have one defined) and is using that when it writes the indexes. So for each Index you're using, you add the size of the index + the size of the hidden primary key to the data file. If you're not using the index (or MySQL can't use it), you are wasting space and adding overhead each time you insert a new number (same for status, as discussed earlier)

Mysql – Query performance with subquery and IN clause

Refactor the query as follows:

SELECT
    readings.*
FROM
    (
        SELECT boxsn FROM readings
        WHERE (time >= 1325404800) 
        AND (time < 1326317400) 
        ORDER BY `time` ASC
    ) readings_keys
    LEFT JOIN
    (
        SELECT id AS boxsn FROM boards WHERE siteId = '1'
    ) boards
    USING (boxsn)
    LEFT JOIN readings
    USING (boxsn)
;

Make sure you have the following indexes:

ALTER TABLE boards ADD INDEX siteId_id_ndx (siteId,id);
ALTER TABLE readings ADD INDEX time_boxsn_ndx (time,boxsn);

You can drop the other index

ALTER TABLE readings DROP INDEX boxsn_time_ndx;

You should definitely see a dramatic improvement in performance as the tables grow.

In your case,

The first EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in readings against a list of value in memory
The second EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in readings against a table.

UPDATE 2012-01-12 14:03 EDT

I refactored it again to make sure the readings keys and boards keys are combined correctly before retrieving the data from the readings table:

SELECT 
    readings.* 
FROM 
    ( 
        SELECT A.* FROM
        (
            SELECT boxsn FROM readings 
            WHERE (time >= 1325404800)  
            AND (time < 1326317400)  
            ORDER BY `time` ASC
        ) A
        LEFT JOIN
        (
            SELECT id AS boxsn
            FROM boards
            WHERE siteId = '1'
        ) B
        USING (boxsn)
        WHERE B.boxsn IS NOT NULL
    ) readings_keys 
    LEFT JOIN readings 
    USING (boxsn) 
;

Best Answer

Related Solutions

Mysql – Improve `Update` performance (rows locking issue)

Mysql – Query performance with subquery and IN clause

UPDATE 2012-01-12 14:03 EDT

Related Question