One scenario in which this would fail depends on how busy (or unlucky) you are in timing.
- Session A does the select on the value, sees that it should INSERT
- Session B does a select on the value before session A does the required insert or update of the value constraint
- Session A inserts row, updates value
- Session B inserts row, updates value
It's all a matter of timing (as suggested by your 98% works). You could acquire a WRITE LOCK
for Session A on the table you are inserting AND on the table you are updating the value, which will block Session B until Session A releases the lock.
Alternatively, you could go ahead and let the DB handle the locking automatically by creating a UNIQUE INDEX and issuing a INSERT .. ON DUPLICATE UPDATE (recommended)
The answer is in the parentheses:
SET price=(SELECT price FROM new WHERE new.id=original.id)
This is a scalar subquery. It only ever returns one value (or no value, in which case it effectively returns NULL
). As @ypercube mentioned in comments, if there were ever more than one matching id
in new
then this expression would throw an error, since a scalar subquery can't and won't deal with more than one possible value.
This is also (by at least some definitions) a correlated subquery, containing a reference to a table (original
) that is not mentioned in the FROM
clause of the subquery.
The subquery is, essentially, executed once for each row in original
in order to find the needed value in new
, and that's how the rows don't get mixed up -- this expression is evaluated for each row in the original table.
At least, conceptually, that's how it happens. The optimizer is free to decide -- within the limits of its design -- if there's a better way to perform your query than the way you've written it, as long as the chosen approach will still provide the exact same result (with the exception of the ordering of rows, which is by definition, undefined, unless you explicitly ORDER BY
in a SELECT
statement). MySQL 5.6 brought some changes in subquery optimization that were largely improvements.
Although sometimes they are absolutely essential, subqueries can be a red flag that a query's logic could be improved, to make things easier on the optimizer and get the work done faster. This is one of those cases.
A better, arguably clearer, and perhaps significantly better-performing way to write the example query would be this:
UPDATE original o
JOIN new n ON n.id = o.id
SET o.price = n.price;
There's no need to include WHERE n.id = o.id
because the join will not only join the rows on that criterion, it will also exclude all rows that can't be joined. The caveat here is that if id
in the new
table isn't unique, for a different reason than the error the original query would throw. In this case, the result isn't deterministic, because you're not able to tell the server which of the more-than-one potentially matching row, so it will pick one, and you can't choose which one. But if id
is unique in the new
table, there's no problem.
MySQL has historically been not the best at handling WHERE ... IN ( ... )
in some cases, and this rewrite also removes the need for that construct.
Best Answer
For an update, you are changing an existing column value in a number of existing rows, 'select'ed based on the "where" clause you specify:
The query locates any rows with product_id 4442 and then, for each one, extracts the current value of price, adds 10 to it, and stores that result back into each record.
For an insert, you are creating a completely new record which, by definition, cannot contain any values (until after it's been created):
So; what value does price represent in this query?
Before this statement executes, there is no row with product_id 2442.
Therefore, there is no price value (in that non-existent row) for the query to retrieve and modify.