Mysql – Understanding JOINs and why the syntax works the way it does

join;MySQL

Heyo,

I'm confused as to how this works:

UPDATE original 
SET price=(SELECT price FROM new WHERE new.id=original.id) 
WHERE id IN (SELECT id FROM new);

What it does is updating an original table with items with the prices of another table. The second table (my update table) has two columns, ID (which matches the corresponding ID of the first table) and a column for the new price.

As far as I understand, MySQL collects all columns that have a corresponding entry in the second table in the third row. And then it somehow magically updates the price column with a price column of the joint values created from the tables new and original. How does it determine which value goes where? How does it make sure nothing goes wrong if there is an ID used twice in the table with the new values?

Some clarification would be nice. Thanks in advance,
Yorrd

EDIT:

My question is, how does MySQL determine which row to match with which new value from the second table. Even if they are unique, how is being made sure that they're not mixed up accidentally?

Best Answer

The answer is in the parentheses:

SET price=(SELECT price FROM new WHERE new.id=original.id)

This is a scalar subquery. It only ever returns one value (or no value, in which case it effectively returns NULL). As @ypercube mentioned in comments, if there were ever more than one matching id in new then this expression would throw an error, since a scalar subquery can't and won't deal with more than one possible value.

This is also (by at least some definitions) a correlated subquery, containing a reference to a table (original) that is not mentioned in the FROM clause of the subquery.

The subquery is, essentially, executed once for each row in original in order to find the needed value in new, and that's how the rows don't get mixed up -- this expression is evaluated for each row in the original table.

At least, conceptually, that's how it happens. The optimizer is free to decide -- within the limits of its design -- if there's a better way to perform your query than the way you've written it, as long as the chosen approach will still provide the exact same result (with the exception of the ordering of rows, which is by definition, undefined, unless you explicitly ORDER BY in a SELECT statement). MySQL 5.6 brought some changes in subquery optimization that were largely improvements.

Although sometimes they are absolutely essential, subqueries can be a red flag that a query's logic could be improved, to make things easier on the optimizer and get the work done faster. This is one of those cases.

A better, arguably clearer, and perhaps significantly better-performing way to write the example query would be this:

UPDATE original o
  JOIN new n ON n.id = o.id
   SET o.price = n.price;

There's no need to include WHERE n.id = o.id because the join will not only join the rows on that criterion, it will also exclude all rows that can't be joined. The caveat here is that if id in the new table isn't unique, for a different reason than the error the original query would throw. In this case, the result isn't deterministic, because you're not able to tell the server which of the more-than-one potentially matching row, so it will pick one, and you can't choose which one. But if id is unique in the new table, there's no problem.

MySQL has historically been not the best at handling WHERE ... IN ( ... ) in some cases, and this rewrite also removes the need for that construct.

Related Solutions

Mysql – Selecting minimum value using a subquery

Modifying slightly your second query, will give you both the merchant id and the lowest price (over all products that pass the conditions - I guess that's what you want):

SELECT p.p_m_id, MIN(p_price) AS min_p_price 
FROM tgmp_affiliates ga 
JOIN tgmp_prices p 
    ON ga.a_code = p.p_gtin 
        AND ga.a_code > '' 
JOIN tgmp_merchants m 
    ON m.m_id = p.p_m_id 
WHERE ga.site_id = '34' 
    AND p.site_id = '34' 
    AND ga.a_parent = '25573' 
    AND p.p_type = 'games' 
    AND m.m_hide = 0 
GROUP BY p.p_m_id ;

Then you can join this - as a derived table - to all the tables that you need data from in the results:

SELECT
    m.*, p.*, ga.*                     -- whatever columns you want  
FROM tgmp_affiliates ga 
JOIN tgmp_prices p 
    ON ga.a_code = p.p_gtin 
        AND ga.a_code > '' 
JOIN tgmp_merchants m 
    ON m.m_id = p.p_m_id 
JOIN
      ( SELECT p.p_m_id, MIN(p_price) AS p_price 
        FROM tgmp_affiliates ga 
        JOIN tgmp_prices p 
            ON ga.a_code = p.p_gtin 
                AND ga.a_code > '' 
        JOIN tgmp_merchants m 
            ON m.m_id = p.p_m_id 
        WHERE ga.site_id = '34' 
            AND p.site_id = '34' 
            AND ga.a_parent = '25573' 
            AND p.p_type = 'games' 
            AND m.m_hide = 0 
        GROUP BY p.p_m_id 
      ) AS tmp
    ON  tmp.p_m_id = p.p_m_id 
    AND tmp.p_price = p.p_price
WHERE ga.site_id = '34' 
    AND p.site_id = '34' 
    AND ga.a_parent = '25573' 
    AND p.p_type = 'games' 
ORDER BY p.p_price ;

MySQL – How to Skip Unchanged Entries When Inserting Prices

You have 30K new records for the 'current' minute; put them in a temp table; include PRIMARY KEY(ArticleId).

Note: History will need PRIMARY KEY(ArticleId, InsertTime).

You need to find the 'latest' existing price for each of the 30K articles in the existing history table. But first, you need to find the latest InsertPrice for each article in this table: (This is 'select#1')

SELECT ArticleId, MAX(InsertTime) AS InsertTime
    FROM History
    GROUP BY ArticleId

Now back to the latest price (select#2):

SELECT ArticleId, ArticlePrice
    FROM ( select#1 ) t1
    JOIN History  USING(ArticleId, InsertTime)

Now let's find the changed prices (select#3):

SELECT temp.ArticleId, temp.LastInsert, temp.ArticlePrice
    FROM ( select#2 ) t2
    LEFT JOIN temp  USING(ArticleId)
    WHERE temp.ArticlePrice != t2.ArticlePrice

I may have also allowed for 'new' Articles to arrive. (But I am not sure.)

Now, let's copy the new Prices over:

INSERT INTO History (ArticleId, InsertTime, ArticlePrice)
    select#3;

Best Answer

Related Solutions

Mysql – Selecting minimum value using a subquery

MySQL – How to Skip Unchanged Entries When Inserting Prices

Related Question