Mysql – How to delete all but the latest entry per customer in the fastest way

deleteMySQLperformancequeryquery-performance

I have a table containing around 4 million unique customers (with customerId each), but each customer can have multiple entries. There is no primary key. Each row has a creation dateTime saved as decimal number. customerId is indexed.

Now I want to find the solution to delete all but the latest row per customer that has the best performance. The thing is, this table will be constantly filled and the cleanup should be done once a day.

What I have tried:

  1. Use a cursor of "select customerId from customers" and iterate over it. Fetch the max creation time per customer. Delete all entries with the id and with a different creation time. -> Sadly, this is incredibly slow.
  2. "Smart" DELETE queries with subselects aren't possible, because MySQL does not allow calling select in a subquery.
  3. The client does not want to user triggers.

Any other ideas?

Best Answer

Draft code:

CREATE PROCEDURE clear_table() AS
BEGIN
    CREATE TABLE temptable LIKE datatable;
    INSERT INTO temptable (customerID, creationDatetime)
        SELECT customerID, MAX(creationDatetime)
        FROM datatable
        GROUP BY customerId;
/* if there exists another fields in tables, then: */
/*  UPDATE temptable, datatable
        SET temptable.field1 = datatable.field1, 
            ... , 
            temptable.fieldN = datatable.fieldN
        WHERE temptable.customerID = datatable.customerID
          AND temptable.creationDatetime = datatable.creationDatetime;
*/
    DROP TABLE datatable;
    ALTER TABLE temptable RENAME TO datatable;
END;

To increase performance - create index by (customerID, creationDatetime).