Mysql – How to Delete rows based on a combination of columns having a certain count

deleteMySQL

I'm having trouble writing a query for the table below.

CREATE TABLE `c_example` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `c_id` int(11) DEFAULT NULL,
  `guid` varchar(32) DEFAULT NULL,
  `provider_id` int(11) DEFAULT NULL,
  `test_timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
) ENGINE=MyISAM AUTO_INCREMENT=5731 DEFAULT CHARSET=utf8;

In this table the data will look like

1,1,313f516ad35e477482aa435ae53c6b3a,1,2016-05-09 22:53:14
2,1,313f516ad35e477482aa435ae53c6b3a,2,2016-05-09 22:53:14
3,1,313f516ad35e477482aa435ae53c6b3a,3,2016-05-09 22:53:14
4,1,f76d2c14a86a42378badd04e4051e569,1,2016-05-10 22:53:14
5,1,f76d2c14a86a42378badd04e4051e569,2,2016-05-10 22:53:14
6,1,f76d2c14a86a42378badd04e4051e569,3,2016-05-10 22:53:14
7,2,c040cecfbff648b2a3ecbd9d0688efec,1,2016-05-11 22:53:14
8,2,c040cecfbff648b2a3ecbd9d0688efec,2,2016-05-11 22:53:14
9,2,c040cecfbff648b2a3ecbd9d0688efec,3,2016-05-11 22:53:14
10,2,79c87ba31df14ec89f0da39e98b6a088,1,2016-05-12 22:53:14
11,2,79c87ba31df14ec89f0da39e98b6a088,2,2016-05-12 22:53:14
12,2,79c87ba31df14ec89f0da39e98b6a088,3,2016-05-12 22:53:14

Now, I want to delete all the rows where the unique combination of c_id, guid that has a count > 1 and delete the ones with the oldest timestamp, so I'm left with.

4,1,f76d2c14a86a42378badd04e4051e569,1,2016-05-10 22:53:14
5,1,f76d2c14a86a42378badd04e4051e569,2,2016-05-10 22:53:14
6,1,f76d2c14a86a42378badd04e4051e569,3,2016-05-10 22:53:14
10,2,79c87ba31df14ec89f0da39e98b6a088,1,2016-05-12 22:53:14
11,2,79c87ba31df14ec89f0da39e98b6a088,2,2016-05-12 22:53:14
12,2,79c87ba31df14ec89f0da39e98b6a088,3,2016-05-12 22:53:14

I'm having trouble coming up with the delete statement.

Best Answer

To solve this problem I did the following.

Created the table as per the question above. I then populated it with this data (a bit extra added for testing purposes).

INSERT INTO c_example 
VALUES 
(1,  1, '313f516ad35e477482aa435ae53c6b3a', 1, '2016-05-09 21:53:14'),
(2,  1, '313f516ad35e477482aa435ae53c6b3a', 2, '2016-05-09 21:53:14'),
(3,  1, '313f516ad35e477482aa435ae53c6b3a', 3, '2016-05-09 21:53:14'),
(4,  1, 'f76d2c14a86a42378badd04e4051e569', 1, '2016-05-10 21:53:14'),
(5,  1, 'f76d2c14a86a42378badd04e4051e569', 2, '2016-05-10 21:53:14'),
(6,  1, 'f76d2c14a86a42378badd04e4051e569', 3, '2016-05-10 21:53:14'),
(7,  2, 'c040cecfbff648b2a3ecbd9d0688efec', 1, '2016-05-11 21:53:14'),
(8,  2, 'c040cecfbff648b2a3ecbd9d0688efec', 2, '2016-05-11 21:53:14'),
(9,  2, 'c040cecfbff648b2a3ecbd9d0688efec', 3, '2016-05-11 21:53:14'),
(10, 2, '79c87ba31df14ec89f0da39e98b6a088', 1, '2016-05-12 21:53:14'),
(11, 2, '79c87ba31df14ec89f0da39e98b6a088', 2, '2016-05-12 21:53:14'),
(12, 2, '79c87ba31df14ec89f0da39e98b6a088', 3, '2016-05-12 21:53:14'),
(13, 3, '79c87ba31df14ec89f0da39e98b6a088', 3, '2016-05-12 21:53:14'),
(14, 4, '79c87ba31df14ec89f0da39e98b6a088', 3, '2016-05-12 21:53:14');

First, we need to obtain the c_ids and test_timestamps of the records we are going to keep:

SELECT c_id, MAX(test_timestamp) AS mydate1 
FROM c_example t1 
GROUP BY t1.c_id
ORDER BY c_id, MAX(test_timestamp)

Result:

+------+---------------------+
| c_id | mydate1             |
+------+---------------------+
|    1 | 2016-05-10 21:53:14 |
|    2 | 2016-05-12 21:53:14 |
|    3 | 2016-05-12 21:53:14 |
|    4 | 2016-05-12 21:53:14 |
+------+---------------------+

Now, we use these to find the ids corresponding to records with with these c_ids and test_timestamps.

SELECT t1.id, t1.c_id, t1.guid, t1.test_timestamp FROM c_example t1
JOIN
(
  SELECT c_id, MAX(test_timestamp) AS mydate1 
  FROM c_example t1 
  GROUP BY t1.c_id
  ORDER BY c_id, MAX(test_timestamp)
) t2 ON t1.c_id = t2.c_id AND t1.test_timestamp = t2.mydate1

Result:

+----+------+----------------------------------+---------------------+
| id | c_id | guid                             | test_timestamp      |
+----+------+----------------------------------+---------------------+
|  4 |    1 | f76d2c14a86a42378badd04e4051e569 | 2016-05-10 21:53:14 |
|  5 |    1 | f76d2c14a86a42378badd04e4051e569 | 2016-05-10 21:53:14 |
|  6 |    1 | f76d2c14a86a42378badd04e4051e569 | 2016-05-10 21:53:14 |
| 10 |    2 | 79c87ba31df14ec89f0da39e98b6a088 | 2016-05-12 21:53:14 |
| 11 |    2 | 79c87ba31df14ec89f0da39e98b6a088 | 2016-05-12 21:53:14 |
| 12 |    2 | 79c87ba31df14ec89f0da39e98b6a088 | 2016-05-12 21:53:14 |
| 13 |    3 | 79c87ba31df14ec89f0da39e98b6a088 | 2016-05-12 21:53:14 |
| 14 |    4 | 79c87ba31df14ec89f0da39e98b6a088 | 2016-05-12 21:53:14 |
+----+------+----------------------------------+---------------------+

These are the records we want to keep. You could just SELECT t1.id for only the id - I just selected the extra fields for clarity.

So, you might think that this query would work (it does in PostgreSQL)

DELETE FROM c_example
WHERE c_example.id NOT IN
(
  SELECT t1.id FROM c_example t1
  JOIN
  (
    SELECT c_id, MAX(test_timestamp) AS mydate1 
    FROM c_example t1 
    GROUP BY t1.c_id
    ORDER BY c_id, MAX(test_timestamp)
  ) t2 ON t1.c_id = t2.c_id AND t1.test_timestamp = t2.mydate1
);

But, due to the delightful (ahem...) quirks of MySQL's brand of SQL, instead of working, one is presented with this error message:

ERROR 1093 (HY000): You can't specify target table 'c_example' for update in FROM clause

As it was put so aptly here "Sometimes I wonder what drugs the MySQL devs are on..." (found while searching for a solution - also to be found in that thread).

So, you have to add a further level of nesting to the query and finally, a query which works:

DELETE FROM c_example 
WHERE c_example.id NOT IN
(
  SELECT t3.id FROM
  (
    SELECT t1.id, t1.c_id, t1.guid, t1.test_timestamp FROM c_example t1
    JOIN
    (
      SELECT c_id, MAX(test_timestamp) AS mydate1 
      FROM c_example t1 
      GROUP BY t1.c_id
      ORDER BY c_id, MAX(test_timestamp)
    ) t2 ON t1.c_id = t2.c_id AND t1.test_timestamp = t2.mydate1
  )AS t3
);

--

Result:

Query OK, 6 rows affected (0.66 sec)

And run the SELECT to verify.

mysql> SELECT * FROM c_example;
+----+------+----------------------------------+-------------+---------------------+
| id | c_id | guid                             | provider_id | test_timestamp      |
+----+------+----------------------------------+-------------+---------------------+
|  4 |    1 | f76d2c14a86a42378badd04e4051e569 |           1 | 2016-05-10 21:53:14 |
|  5 |    1 | f76d2c14a86a42378badd04e4051e569 |           2 | 2016-05-10 21:53:14 |
|  6 |    1 | f76d2c14a86a42378badd04e4051e569 |           3 | 2016-05-10 21:53:14 |
| 10 |    2 | 79c87ba31df14ec89f0da39e98b6a088 |           1 | 2016-05-12 21:53:14 |
| 11 |    2 | 79c87ba31df14ec89f0da39e98b6a088 |           2 | 2016-05-12 21:53:14 |
| 12 |    2 | 79c87ba31df14ec89f0da39e98b6a088 |           3 | 2016-05-12 21:53:14 |
| 13 |    3 | 79c87ba31df14ec89f0da39e98b6a088 |           3 | 2016-05-12 21:53:14 |
| 14 |    4 | 79c87ba31df14ec89f0da39e98b6a088 |           3 | 2016-05-12 21:53:14 |
+----+------+----------------------------------+-------------+---------------------+
8 rows in set (0.00 sec)

Which should be the right answer - if this doesn't solve your issue, please clarify the question.