MySQL – Troubleshooting Delete Query That Does Not End

deletejoin;MySQL

I have a following table with 1 million records :

CREATE TABLE `activity` (
  `id` int(9) unsigned NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  `subscription_id` int(10) unsigned NOT NULL,
  `vid` int(10) unsigned NOT NULL,
  `action` enum('init','confirm','suspend') DEFAULT NULL,
  `source` varchar(20) DEFAULT NULL,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `num` varchar(32) NOT NULL,
  `model` varchar(20) DEFAULT NULL,
  `price` float(5,2) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `num` (`num`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

I want to delete entries for same num value but that are old. So I came up with following query :

delete activity
 from activity
 inner join (
   select max(id) as maxid, num
     from activity
    group by num
   having count(*) > 1) duplic on duplic.num = activity.num
where activity.id < duplic.maxid;

The select query works fine, returns around 400.000 records. But if I try delete, the query executes infinitely. What might be wrong with this? How to investigate?

Best Answer

Plan A:

  1. Create temp table with num and max(id). 400K rows
  2. Chunk that table to do only 1000 deletes at a time; there would be a JOIN in the DELETE to the temp table.

More on Chunking for DELETE.

Plan B:

You need to keep 60% of the table, correct? So, it may be faster to rebuild the table with only the desired rows.

CREATE TABLE new LIKE activity;
ALTER TABLE new ADD UNIQUE(num);   -- in order to catch dups
INSERT IGNORE   -- silently ignore dup nums
    INTO new SELECT * FROM activity
    ORDER BY num, id DESC;   -- same effect as MAX
RENAME TABLE activity TO old, new TO activity;
DROP TABLE old;

Plan C:

ALTER TABLE IGNORE activity
    DROP KEY num,
    ADD UNIQUE(num);

Problem (with Plan C): you cannot predict which of dup rows will be deleted.