MySQL Update a table based on the last data from another table while grouping with conditions

group byMySQLorder-bysubqueryupdate

I need a query to UPDATE a series of fields based on another table

Table lp_plates_backup

|       uid | brand    | model    |
|         1 |          |          |
|         a | Old      | Error    |
|       ... | ...      | ...      |

Table lp_pictures_backup

| plate_uid | brand    | model    | date_validated      |
|         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | < Last one
|         1 | BMW      | 7-Series | 2014-08-04 11:21:18 |
|         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |
|         a | Tesla    | Model S  | 2014-12-17 11:00:00 | < Last one
|         a | BMW      | 3-Series | 2014-11-07 14:34:11 |

I need to update columns brand and model of lp_plates_backup based on the last date_validated entry of table lp_plates_backup while lp_plates_backup.uid match lp_pictures_backup.plate_uid for each lp_plates_backup.uid but only if:

  • lp_pictures_backup.brand is not empty
  • lp_plates_backup.brand is different from lp_pictures_backup.brand OR lp_plates_backup.model is different from lp_pictures_backup.model

The result should shown (lp_plates_backup) :

|       uid | brand    | model    |
|         1 | Fiat     | Panda    |
|         a | Tesla    | Model S  |

I have read these posts and made lot of tests but without getting the expected result:

Here is the last query I try:

UPDATE
    lp_plates_backup AS t
    LEFT JOIN (
        SELECT
            plate_uid, brand, model, date_validated
        FROM
            lp_pictures_backup as parent
        WHERE
            brand <> '' AND
            date_validated = (SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid)
        GROUP BY
            plate_uid
    ORDER BY
        date_validated DESC

) AS m ON
    m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model

I think I had too much brainstormed and getting confused making useless too complicated code.

Any expert in the place?

Thanks for helping.

Best Answer

The correct query is:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        SELECT
            plate_uid, brand, model, date_validated
        FROM
            lp_pictures_backup as parent
        WHERE
            brand <> '' AND
            date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY
                     plate_uid)
) AS m ON
    m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

Just some little explanations. You need INNER JOIN because you must update row of lp_plates_backup only if plate_uid exists into lp_plates_backup. ORDER BY is useless because you are selecting all rows, order is not important.

You need the max date_validated of rows grouped by plate_uid, select MAX(date_validate) must individuate only a row for plate_uid through the date_validated field. So you need to add group by to select max(..).

The query select plate_uid returns a row for plate_uid so you don't need to aggregate here. The condition of single row is already builded into select max subquery.

I hope you did not get confused by me :-)


Updated

The previous works well if date_validate is the pair (plate_uid, date_validated) is unique. If you have this kind of data:

| plate_uid | brand    | model    | date_validated      |
|         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | 
|         1 | BMW      | 7-Series | 2014-10-11 10:03:20 |   <- changed data
|         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |   
|         a | Tesla    | Model S  | 2014-12-17 11:00:00 | 
|         a | BMW      | 3-Series | 2014-11-07 14:34:11 |

The following query returns the first two rows for plate_uid 1.

SELECT plate_uid, brand, model, date_validated
FROM lp_pictures_backup as parent
WHERE
      brand <> '' AND
      date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY plate_uid)

The update will choice the values from the first or the second row. I think of the following alternatives:

  1. using some other lp_pictures_backup fields to choise between rows with the same date_validated.
  2. enforcing a unique constraint on table. I.e. alter table lp_pictures_backup add unique index (plate_id, date_validated). Rejecting invalid data.
  3. detecting valid lp_pictures_backup pair of (plate_uid, date_validated). Updating lp_plates_backup only with valid pairs, review invalid pairs and correct them.

Maybe there are more alternatives. I prefer enforcing contraint on data so to have better data. I expand 3th alternative. Just create a view to define what a valid pair (plate_uid, date_validated) is:

CREATE VIEW lp_pictures_backup_valid as
SELECT plate_uid, date_validated
FROM lp_pictures_backup as parent
WHERE date_validated = (
        select max(date_validated) 
        from lp_pictures_backup t2 
        where t2.plate_uid =  parent.plate_uid GROUP BY plate_uid
      )
group by plate_uid, date_validated
having count(*) = 1;                    <-- you can change this to make an invalid row

A valid pair (plate_uid, date_validated) is a pair with the max date_validated only if there are a unique date_validated value.

I rewrite the update statements to consider only valid pair:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        -- modification start
        SELECT p.plate_uid, p.brand, p.model, p.date_validated
        FROM lp_pictures_backup as p
             INNER JOIN lp_pictures_backup_valid valid 
                        ON p.plate_uid = valid.plate_uid and 
                           p.date_validated = valid.date_validated
        WHERE p.brand <> '')
        -- modification end
    ) 
    AS m ON m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

Hope this make sense.


Update: 2014-03-20

In the first case:

Blockquote 1. using some other lp_pictures_backup fields to choise between rows with the same date_validated.

I have assumed your date is like this:

|id| plate_uid | brand    | model    | date_validated      |
|4 |         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | 
|3 |         1 | BMW      | 7-Series | 2014-10-11 10:03:20 |   
|2 |         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|1 |         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |    
|2 |         a | Tesla    | Model S  | 2014-12-17 11:00:00 | 
|1 |         a | BMW      | 3-Series | 2014-11-07 14:34:11 |

You can try this:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        SELECT t1.plate_uid, t1.brand, t1.model, t1.date_validated
        FROM lp_pictures_backup as t1,
          (SELECT t2.plate_uid, MAX(id) as id,  MAX(date_validated) as dv
           FROM lp_pictures_backup as t2
           GROUP BY t2.plate_uid) as t3                 
        WHERE t1.brand <> '' AND
              t1.plate_uid = t3.plate_uid AND
              t1.date_validated = t3.dv AND
              t1.id = t3.id
) AS m ON
    m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

The fields used to choise the row to be updated are extracted by this part:

 ...
 (SELECT t2.plate_uid, MAX(id) as id,  MAX(date_validated) as dv
  FROM lp_pictures_backup as t2
  GROUP BY t2.plate_uid) as t3 
 ...

So it assumes a correlation between date_validated and id: at increasing dates corresponds to increasing id.

Hope it help.