Mysql – SQL delete where not in (…) query not behaving as expected

deleteMySQLmysql-5.6

We have the following query which returns the most recent fares from our database :

SELECT f1.id 
  FROM ( SELECT * FROM fares ) AS f1
INNER JOIN ( SELECT origin_id
                  , destination_id
                  , market_id
                  , cabin
                  , tripType
                  , max(created_at) AS maxDate 
               FROM fares
              GROUP BY origin_id
                     , destination_id
                     , market_id
                     , cabin
                     , tripType
            ) AS f2  ON f2.origin_id      = f1.origin_id 
                    AND f2.destination_id = f1.destination_id 
                    AND f2.market_id      = f1.market_id 
                    AND f2.cabin          = f1.cabin 
                    AND f2.tripType       = f1.tripType
 WHERE f1.created_at = f2.maxDate
 GROUP BY f1.origin_id
        , f1.destination_id
        , f1.market_id
        , f1.cabin
        , f1.tripType

In our test dataset (SQL Fiddle) this returns two rows. When we change the query to work as a delete query it deletes everything but three rows, while it should delete everything but the two rows:

DELETE FROM fares 
 WHERE id NOT IN 
     ( SELECT f1.id 
         FROM ( SELECT * FROM fares ) AS f1
        INNER JOIN ( SELECT origin_id
                          , destination_id
                          , market_id
                          , cabin
                          , tripType
                          , max(created_at) AS maxDate 
                       FROM fares        
                      GROUP BY origin_id
                             , destination_id
                             , market_id
                             , cabin
                             , tripType
                   ) AS f2  ON f2.origin_id = f1.origin_id 
                           AND f2.destination_id = f1.destination_id 
                           AND f2.market_id = f1.market_id 
                           AND f2.cabin = f1.cabin 
                           AND f2.tripType = f1.tripType
        WHERE f1.created_at = f2.maxDate
        GROUP BY f1.origin_id
               , f1.destination_id 
               , f1.market_id
               , f1.cabin
               , f1.tripType
    )

We noticed that the three rows that remain are also the product of the select query when the GROUP BY part is ommited.

Now the question is why does the SELECT query in the DELETE statement behave different than the SELECT statement by itself?

Reproduction:

  • Take the schema from this SQLFiddle: http://sqlfiddle.com/#!9/e943ee/2
  • Run the query supplied in the SQLFiddle, only two rows are returned
  • Add the delete query to the schema & rebuild the schema
  • Execute "SELECT * FROM fares" and three rows are returned (where there should have been only two)

Best Answer

If you want to purge an outdated data then your query is a bit complicated:

DELETE FROM fares AS f1 
  LEFT JOIN ( SELECT origin_id
                   , destination_id
                   , market_id
                   , cabin
                   , tripType
                   , max(created_at) AS maxDate 
                FROM fares        
               GROUP BY origin_id
                      , destination_id
                      , market_id
                      , cabin
                      , tripType
            ) AS f2  ON f2.origin_id      = f1.origin_id 
                    AND f2.destination_id = f1.destination_id 
                    AND f2.market_id      = f1.market_id 
                    AND f2.cabin          = f1.cabin 
                    AND f2.tripType       = f1.tripType
                    AND f2.maxDate        = f1.created_at  /* was omitted */
      WHERE f2.maxDate IS NULL
)