I am a bit confused about 2 queries.
First query I am running on a 50m rows table by index on a timestamp
row with a limit, I would expect the query to be immediate regardless of the size of the table, since I run on an index with small limit. Am I wrong?
SELECT 1
FROM indicators.fileso fo
WHERE fo.lastSeen < NOW() - INTERVAL 2 MONTH
limit 100;
Create table:
CREATE TABLE `fileso` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`hostId` int(11) unsigned NOT NULL COMMENT 'code',
`sha256` binary(32) NOT NULL COMMENT 'meir',
`fileName` varchar(150) DEFAULT NULL COMMENT 'meir',
`fullPath` varchar(350) DEFAULT NULL COMMENT 'meir',
`lastSeen` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`uniqueness` binary(32) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
PRIMARY KEY (`id`),
UNIQUE KEY `uniqueness` (`uniqueness`),
KEY `lastSeen` (`lastSeen`)
) ENGINE=InnoDB AUTO_INCREMENT=1842469487 DEFAULT CHARSET=utf8
Second query when I select by timestamp row with index, with order by desc limit 1
its immediate, but when I do order by asc limit 1
it takes alot of time, dont understand that.
Immediate query:
SELECT 1
FROM indicators.fileso fo
ORDER BY fo.lastSeen DESC
LIMIT 1;
Takes time:
SELECT 1
FROM indicators.fileso fo
ORDER BY fo.lastSeen ASC
LIMIT 1;
Best Answer
(This does not address your question directly; rather it addresses your goal of deleting old records.)
Plan A
You are talking about
DELETEing
"old" records. Assuming you are removing "all" the old records, thenPARTITIONing
is an excellent way to do that.PARTITION BY RANGE(TO_DAYS)
by weeks. UseDROP PARTITION
(instantaneous and non-invasive) to get rid of the oldest week andREORGANIZE PARTITION
to add a new partition. Details. It will take some downtime to initially partition your 50-million-row table.Plan B
Alternatively, you can continually walk through the table, using the
PRIMARY KEY
for keeping track of where you are. Find the 1000th row hence & delete any 'old' records. Repeat. Details. And it does not need an index onlastseen
; removing it could be an efficiency bonus.Since the table is clustered on the PK(id), walking through the table is very efficient -- very little I/O for each step. (For unknown reasons, your original solution was more I/O-bound than it should have been.)