I need a query to UPDATE a series of fields based on another table
Table lp_plates_backup
| uid | brand | model |
| 1 | | |
| a | Old | Error |
| ... | ... | ... |
Table lp_pictures_backup
| plate_uid | brand | model | date_validated |
| 1 | Fiat | Panda | 2014-10-11 10:03:20 | < Last one
| 1 | BMW | 7-Series | 2014-08-04 11:21:18 |
| 1 | BMW | 7-Series | 2014-07-28 19:14:02 |
| 1 | Mercedes | S-Class | 2014-06-12 08:54:57 |
| a | Tesla | Model S | 2014-12-17 11:00:00 | < Last one
| a | BMW | 3-Series | 2014-11-07 14:34:11 |
I need to update columns brand and model of lp_plates_backup based on the last date_validated entry of table lp_plates_backup while lp_plates_backup.uid match lp_pictures_backup.plate_uid for each lp_plates_backup.uid but only if:
- lp_pictures_backup.brand is not empty
- lp_plates_backup.brand is different from lp_pictures_backup.brand OR lp_plates_backup.model is different from lp_pictures_backup.model
The result should shown (lp_plates_backup) :
| uid | brand | model |
| 1 | Fiat | Panda |
| a | Tesla | Model S |
I have read these posts and made lot of tests but without getting the expected result:
- https://stackoverflow.com/questions/1973246/how-to-specify-the-parent-query-field-from-within-a-subquery-in-mysql
- https://stackoverflow.com/questions/17038193/select-row-with-most-recent-date-per-user
- https://stackoverflow.com/questions/11156163/optimize-sub-query-selecting-last-record-of-each-group
Here is the last query I try:
UPDATE
lp_plates_backup AS t
LEFT JOIN (
SELECT
plate_uid, brand, model, date_validated
FROM
lp_pictures_backup as parent
WHERE
brand <> '' AND
date_validated = (SELECT MAX(date_validated)
FROM lp_pictures_backup as t2
WHERE t2.plate_uid = parent.plate_uid)
GROUP BY
plate_uid
ORDER BY
date_validated DESC
) AS m ON
m.plate_uid = t.uid
SET
t.brand = m.brand,
t.model = m.model
WHERE
t.brand <> m.brand
OR
t.model <> m.model
I think I had too much brainstormed and getting confused making useless too complicated code.
Any expert in the place?
Thanks for helping.
Best Answer
The correct query is:
Just some little explanations. You need
INNER JOIN
because you must update row oflp_plates_backup
only ifplate_uid
exists intolp_plates_backup
.ORDER BY
is useless because you are selecting all rows, order is not important.You need the max
date_validated
of rows grouped byplate_uid
, select MAX(date_validate) must individuate only a row forplate_uid
through thedate_validated
field. So you need to add group by to select max(..).The query
select plate_uid
returns a row forplate_uid
so you don't need to aggregate here. The condition of single row is already builded intoselect max
subquery.I hope you did not get confused by me :-)
Updated
The previous works well if
date_validate
is the pair (plate_uid
,date_validated
) is unique. If you have this kind of data:The following query returns the first two rows for
plate_uid
1.The update will choice the values from the first or the second row. I think of the following alternatives:
lp_pictures_backup
fields to choise between rows with the same date_validated.alter table lp_pictures_backup add unique index (plate_id, date_validated)
. Rejecting invalid data.lp_pictures_backup
pair of(plate_uid, date_validated)
. Updatinglp_plates_backup
only with valid pairs, review invalid pairs and correct them.Maybe there are more alternatives. I prefer enforcing contraint on data so to have better data. I expand 3th alternative. Just create a view to define what a valid pair
(plate_uid, date_validated)
is:A valid pair
(plate_uid, date_validated)
is a pair with the max date_validated only if there are a unique date_validated value.I rewrite the update statements to consider only valid pair:
Hope this make sense.
Update: 2014-03-20
In the first case:
I have assumed your date is like this:
You can try this:
The fields used to choise the row to be updated are extracted by this part:
So it assumes a correlation between date_validated and id: at increasing dates corresponds to increasing id.
Hope it help.