MySQL – Deleting old data; Select with index taking time

deleteinnodbMySQL

I am a bit confused about 2 queries.

First query I am running on a 50m rows table by index on a timestamp row with a limit, I would expect the query to be immediate regardless of the size of the table, since I run on an index with small limit. Am I wrong?

SELECT 1
FROM indicators.fileso fo 
WHERE fo.lastSeen < NOW() - INTERVAL 2 MONTH
limit 100;

Create table:

CREATE TABLE `fileso` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `hostId` int(11) unsigned NOT NULL COMMENT 'code',
  `sha256` binary(32) NOT NULL COMMENT 'meir',
  `fileName` varchar(150) DEFAULT NULL COMMENT 'meir',
  `fullPath` varchar(350) DEFAULT NULL COMMENT 'meir',
  `lastSeen` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `uniqueness` binary(32) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniqueness` (`uniqueness`),
  KEY `lastSeen` (`lastSeen`)
) ENGINE=InnoDB AUTO_INCREMENT=1842469487 DEFAULT CHARSET=utf8

Explain:
pic1

Second query when I select by timestamp row with index, with order by desc limit 1 its immediate, but when I do order by asc limit 1 it takes alot of time, dont understand that.

Immediate query:

SELECT 1
FROM indicators.fileso fo
ORDER BY fo.lastSeen DESC
LIMIT 1;

Takes time:

SELECT 1
FROM indicators.fileso fo
ORDER BY fo.lastSeen ASC
LIMIT 1;

Best Answer

(This does not address your question directly; rather it addresses your goal of deleting old records.)

Plan A

You are talking about DELETEing "old" records. Assuming you are removing "all" the old records, then PARTITIONing is an excellent way to do that. PARTITION BY RANGE(TO_DAYS) by weeks. Use DROP PARTITION (instantaneous and non-invasive) to get rid of the oldest week and REORGANIZE PARTITION to add a new partition. Details. It will take some downtime to initially partition your 50-million-row table.

Plan B

Alternatively, you can continually walk through the table, using the PRIMARY KEY for keeping track of where you are. Find the 1000th row hence & delete any 'old' records. Repeat. Details. And it does not need an index on lastseen; removing it could be an efficiency bonus.

Since the table is clustered on the PK(id), walking through the table is very efficient -- very little I/O for each step. (For unknown reasons, your original solution was more I/O-bound than it should have been.)