MySQL – Deleting old data; Select with index taking time

deleteinnodbMySQL

I am a bit confused about 2 queries.

First query I am running on a 50m rows table by index on a timestamp row with a limit, I would expect the query to be immediate regardless of the size of the table, since I run on an index with small limit. Am I wrong?

SELECT 1
FROM indicators.fileso fo 
WHERE fo.lastSeen < NOW() - INTERVAL 2 MONTH
limit 100;

Create table:

CREATE TABLE `fileso` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `hostId` int(11) unsigned NOT NULL COMMENT 'code',
  `sha256` binary(32) NOT NULL COMMENT 'meir',
  `fileName` varchar(150) DEFAULT NULL COMMENT 'meir',
  `fullPath` varchar(350) DEFAULT NULL COMMENT 'meir',
  `lastSeen` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `uniqueness` binary(32) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniqueness` (`uniqueness`),
  KEY `lastSeen` (`lastSeen`)
) ENGINE=InnoDB AUTO_INCREMENT=1842469487 DEFAULT CHARSET=utf8

Explain:

Second query when I select by timestamp row with index, with order by desc limit 1 its immediate, but when I do order by asc limit 1 it takes alot of time, dont understand that.

Immediate query:

SELECT 1
FROM indicators.fileso fo
ORDER BY fo.lastSeen DESC
LIMIT 1;

Takes time:

SELECT 1
FROM indicators.fileso fo
ORDER BY fo.lastSeen ASC
LIMIT 1;

Best Answer

(This does not address your question directly; rather it addresses your goal of deleting old records.)

Plan A

You are talking about DELETEing "old" records. Assuming you are removing "all" the old records, then PARTITIONing is an excellent way to do that. PARTITION BY RANGE(TO_DAYS) by weeks. Use DROP PARTITION (instantaneous and non-invasive) to get rid of the oldest week and REORGANIZE PARTITION to add a new partition. Details. It will take some downtime to initially partition your 50-million-row table.

Plan B

Alternatively, you can continually walk through the table, using the PRIMARY KEY for keeping track of where you are. Find the 1000th row hence & delete any 'old' records. Repeat. Details. And it does not need an index on lastseen; removing it could be an efficiency bonus.

Since the table is clustered on the PK(id), walking through the table is very efficient -- very little I/O for each step. (For unknown reasons, your original solution was more I/O-bound than it should have been.)

Related Solutions

Mysql – Adding index to large thesql tables

The activity of altering big tables are done in phases:

Create a new table with required fields and indexes say in test DB (just structure)
Dump the data from the existing table and load the same to the newly created table in test DB
Now announce your downtime :)
Swap the tables by renaming - RENAME table ur_db.table_name to test.temp, test.table_name to ur_db.table_name, test.temp to test.table_name; This is an atomic operation taking fraction of a second.
Load the extra records to the newly created table ( the records that came after dumping and then loading). This step can be done before Step: 3 also once to reduce your downtime.
And your system is back

Few notes:

You need not hit information schema directly like this, try using SHOW TABLE STATUS from db like 'table_name'
The speed of alter table is more or less linked with the I/O speed. Last time when we ran a direct alter table(without the above steps), we had 40GB+ table size it took around 4hours. If your 20GB data is taking years, you are working on some outdated machine.
Also drop of the unwanted indexes like index_bslt_ondate, index_base_schedule_line_items_loan_base_schedule as some other indexes has the left most column as the indexed column

Please let me know if you need any clarification on any of these steps.

Edit : A simple python script to automate the process https://github.com/georgecj11/hotswap_mysql_table.git

Mysql – #1093 – You can’t specify target table ‘R’ for update in FROM clause

Based on the previous question and this, the following should do what you want. As with Pieter's update it doesn't assume that the ID's are in sequential order, rather it uses the ReqTime to determine order.

UPDATE Visits R
  INNER JOIN (
    SELECT ID,
      TIME_TO_SEC(TIMEDIFF(
        (SELECT B.ReqTime
         FROM Visits B
         WHERE B.ReqTime > A.ReqTime
         ORDER BY B.ReqTime
         LIMIT 1
         )
        ,A.ReqTime)) Duration
    FROM Visits A
   ) N ON R.ID = N.ID
SET R.Duration = N.Duration
WHERE Duration IS NULL;

Best Answer

Related Solutions

Mysql – Adding index to large thesql tables

Mysql – #1093 – You can’t specify target table ‘R’ for update in FROM clause

Related Question