We have the following query which returns the most recent fares from our database :
SELECT f1.id
FROM ( SELECT * FROM fares ) AS f1
INNER JOIN ( SELECT origin_id
, destination_id
, market_id
, cabin
, tripType
, max(created_at) AS maxDate
FROM fares
GROUP BY origin_id
, destination_id
, market_id
, cabin
, tripType
) AS f2 ON f2.origin_id = f1.origin_id
AND f2.destination_id = f1.destination_id
AND f2.market_id = f1.market_id
AND f2.cabin = f1.cabin
AND f2.tripType = f1.tripType
WHERE f1.created_at = f2.maxDate
GROUP BY f1.origin_id
, f1.destination_id
, f1.market_id
, f1.cabin
, f1.tripType
In our test dataset (SQL Fiddle) this returns two rows. When we change the query to work as a delete query it deletes everything but three rows, while it should delete everything but the two rows:
DELETE FROM fares
WHERE id NOT IN
( SELECT f1.id
FROM ( SELECT * FROM fares ) AS f1
INNER JOIN ( SELECT origin_id
, destination_id
, market_id
, cabin
, tripType
, max(created_at) AS maxDate
FROM fares
GROUP BY origin_id
, destination_id
, market_id
, cabin
, tripType
) AS f2 ON f2.origin_id = f1.origin_id
AND f2.destination_id = f1.destination_id
AND f2.market_id = f1.market_id
AND f2.cabin = f1.cabin
AND f2.tripType = f1.tripType
WHERE f1.created_at = f2.maxDate
GROUP BY f1.origin_id
, f1.destination_id
, f1.market_id
, f1.cabin
, f1.tripType
)
We noticed that the three rows that remain are also the product of the select query when the GROUP BY part is ommited.
Now the question is why does the SELECT query in the DELETE statement behave different than the SELECT statement by itself?
Reproduction:
- Take the schema from this SQLFiddle: http://sqlfiddle.com/#!9/e943ee/2
- Run the query supplied in the SQLFiddle, only two rows are returned
- Add the delete query to the schema & rebuild the schema
- Execute "SELECT * FROM fares" and three rows are returned (where there should have been only two)
Best Answer
If you want to purge an outdated data then your query is a bit complicated: