Mysql – Quickly find first/last row in table

MySQL

Finding the first row of a table with an alpha date/time (eg. "20200515.152300") of a particular table is taking a long time. I am using this query:

SELECT path FROM tsFileListTbl WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ORDER BY `dateTime` ASC LIMIT 1;
+----------------------------------+
| path                             |
+----------------------------------+
| /mnt/das.h/18/20200207.204900.ts |
+----------------------------------+
1 row in set (14.32 sec)

Where:

DESCRIBE tsFileListTbl;
+-------------+---------------------+------+-----+---------+----------------+
| Field       | Type                | Null | Key | Default | Extra          |
+-------------+---------------------+------+-----+---------+----------------+
| id          | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| tsNum       | int(11)             | NO   | MUL | NULL    |                |
| path        | varchar(128)        | NO   | UNI |         |                |
| dateTime    | char(15)            | YES  | MUL |         |                |
| endDateTime | char(15)            | YES  |     |         |                |
+-------------+---------------------+------+-----+---------+----------------+
5 rows in set (0.60 sec)

And…

SHOW INDEXES ON tsFileListTbl;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ON tsFileListTbl' at line 1
mysql> SHOW INDEX FROM tsFileListTbl;
+---------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table         | Non_unique | Key_name         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tsFileListTbl |          0 | PRIMARY          |            1 | id          | A         |     3159454 |     NULL | NULL   |      | BTREE      |         |
| tsFileListTbl |          0 | path             |            1 | path        | A         |     3159454 |     NULL | NULL   |      | BTREE      |         |
| tsFileListTbl |          1 | dateTimeIdx      |            1 | dateTime    | A         |     1579727 |     NULL | NULL   | YES  | BTREE      |         |
| tsFileListTbl |          1 | tsNumDateIdx     |            1 | tsNum       | A         |          36 |     NULL | NULL   |      | BTREE      |         |
| tsFileListTbl |          1 | tsNumDateIdx     |            2 | dateTime    | A         |     3159454 |     NULL | NULL   | YES  | BTREE      |         |
| tsFileListTbl |          1 | tsNumPathDateIdx |            1 | tsNum       | A         |          36 |     NULL | NULL   |      | BTREE      |         |
| tsFileListTbl |          1 | tsNumPathDateIdx |            2 | path        | A         |     3159454 |     NULL | NULL   |      | BTREE      |         |
| tsFileListTbl |          1 | tsNumPathDateIdx |            3 | dateTime    | A         |     3159454 |     NULL | NULL   | YES  | BTREE      |         |
+---------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (1.41 sec)

Is there anything I could do, without changing the table field definitions, to improve the query time?

Linux / Centos 6 / 32-bit

mysql Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (i386) using readline 5.1

EDIT I

I ran the query directly from mysql to look at different measurements.

show processlist
Query   |  188 | Sending data | SELECT path FROM tsFileListTbl WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ...

In all the query took 4 minutes 20 seconds to complete.

mysql> explain SELECT path FROM tsFileListTbl WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ORDER BY `dateTime` ASC LIMIT 1;
+----+-------------+---------------+-------+------------------------------------+-------------+---------+------+-------+-------------+
| id | select_type | table         | type  | possible_keys                      | key         | key_len | ref  | rows  | Extra       |
+----+-------------+---------------+-------+------------------------------------+-------------+---------+------+-------+-------------+
|  1 | SIMPLE      | tsFileListTbl | index | path,tsNumDateIdx,tsNumPathDateIdx | dateTimeIdx | 46      | NULL | 15931 | Using where |
+----+-------------+---------------+-------+------------------------------------+-------------+---------+------+-------+-------------+

EXPLAIN is new to me. Does the output above mean that the dateTimeIdx was used for the query? If so, that would be the correct index, right?

SELECT COUNT(*) FROM tsFileListTbl;
+----------+
| COUNT(*) |
+----------+
|  2913301 |
+----------+
1 row in set (34.77 sec)

EDIT II

As requested, here is the output from SHOW CREATE TABLE tsFileListTbl:

CREATE TABLE `tsFileListTbl` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `tsNum` int(11) NOT NULL,
  `path` varchar(128) NOT NULL DEFAULT '',
  `dateTime` char(15) DEFAULT '',
  `endDateTime` char(15) DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `path` (`path`),
  KEY `dateTimeIdx` (`dateTime`),
  KEY `tsNumDateIdx` (`tsNum`,`dateTime`),
  KEY `tsNumPathDateIdx` (`tsNum`,`path`,`dateTime`)
) ENGINE=InnoDB AUTO_INCREMENT=3015594 DEFAULT CHARSET=utf8

EDIT III

One of the indices suggested by Rick James was already there. After adding the other one, the query ran faster, not not that much faster:

mysql> SELECT path FROM tsFileListTbl WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ORDER BY `dateTime` ASC LIMIT 1;
+----------------------------------+
| path                             |
+----------------------------------+
| /mnt/das.h/18/20200208.035900.ts |
+----------------------------------+
1 row in set (2 min 5.44 sec)

mysql> EXPLAIN SELECT path FROM tsFileListTbl WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ORDER BY `dateTime` ASC LIMIT 1;
+----+-------------+---------------+-------+-----------------------------------------------------+-------------+---------+------+---------+-------------+
| id | select_type | table         | type  | possible_keys                                       | key         | key_len | ref  | rows    | Extra       |
+----+-------------+---------------+-------+-----------------------------------------------------+-------------+---------+------+---------+-------------+
|  1 | SIMPLE      | tsFileListTbl | index | path,tsNumDateIdx,tsNumPathDateIdx,tsNumDatePathIdx | dateTimeIdx | 46      | NULL | 2912893 | Using where |
+----+-------------+---------------+-------+-----------------------------------------------------+-------------+---------+------+---------+-------------+
1 row in set (0.02 sec)

CREATE TABLE `tsFileListTbl` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `tsNum` int(11) NOT NULL,
  `path` varchar(128) NOT NULL DEFAULT '',
  `dateTime` char(15) DEFAULT '',
  `endDateTime` char(15) DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `path` (`path`),
  KEY `dateTimeIdx` (`dateTime`),
  KEY `tsNumDateIdx` (`tsNum`,`dateTime`),
  KEY `tsNumPathDateIdx` (`tsNum`,`path`,`dateTime`),
  KEY `tsNumDatePathIdx` (`tsNum`,`dateTime`,`path`)
) ENGINE=InnoDB AUTO_INCREMENT=3015730 DEFAULT CHARSET=utf8

EDIT IV

I tried the "IGNORE INDEX" as suggested by Rick James. It takes half the time as the previous time but still about 1 minute to find the first row. Is that the best I can expect? I was hoping to find first/last row in the order of milliseconds not seconds…

SELECT path FROM tsFileListTbl IGNORE INDEX(dateTimeIdx) WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ORDER BY `dateTime` ASC LIMIT 1;

+----------------------------------+
| path                             |
+----------------------------------+
| /mnt/das.h/18/20200208.055500.ts |
+----------------------------------+
1 row in set (57.17 sec)

mysql> EXPLAIN SELECT path FROM tsFileListTbl IGNORE INDEX(dateTimeIdx) WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ORDER BY `dateTime` ASC LIMIT 1;
+----+-------------+---------------+-------+-----------------------------------------------------+--------------+---------+------+--------+-------------+
| id | select_type | table         | type  | possible_keys                                       | key          | key_len | ref  | rows   | Extra       |
+----+-------------+---------------+-------+-----------------------------------------------------+--------------+---------+------+--------+-------------+
|  1 | SIMPLE      | tsFileListTbl | range | path,tsNumDateIdx,tsNumPathDateIdx,tsNumDatePathIdx | tsNumDateIdx | 4       | NULL | 125828 | Using where |
+----+-------------+---------------+-------+-----------------------------------------------------+--------------+---------+------+--------+-------------+
1 row in set (0.16 sec)

EDIT V

FLUSH STATUS;
SELECT path FROM tsFileListTbl WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ORDER BY `dateTime` ASC LIMIT 1;
+----------------------------------+
| path                             |
+----------------------------------+
| /mnt/das.h/18/20200208.083300.ts |
+----------------------------------+
1 row in set (10.51 sec)

mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 1       |
| Handler_delete             | 0       |
| Handler_discover           | 0       |
| Handler_prepare            | 0       |
| Handler_read_first         | 1       |
| Handler_read_key           | 1       |
| Handler_read_next          | 1498109 |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 0       |
| Handler_read_rnd_next      | 0       |
| Handler_rollback           | 0       |
| Handler_savepoint          | 0       |
| Handler_savepoint_rollback | 0       |
| Handler_update             | 0       |
| Handler_write              | 0       |
+----------------------------+---------+
15 rows in set (0.46 sec)

EDIT VI

Ignoring all indices except for one improves the performance a lot on the test system. Will try on the production machine.

SELECT path FROM tsFileListTbl IGNORE INDEX(dateTimeIdx, tsNumDateIdx,tsNumPathDateIdx) WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ORDER BY `dateTime` ASC LIMIT 1;
+----------------------------------+
| path                             |
+----------------------------------+
| /mnt/das.h/18/20200209.092000.ts |
+----------------------------------+
1 row in set (0.71 sec)

EXPLAIN  SELECT path FROM tsFileListTbl IGNORE INDEX(dateTimeIdx, tsNumDateIdx,tsNumPathDateIdx) WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%" ORDER BY `dateTime` ASC LIMIT 1;
+----+-------------+---------------+------+-----------------------+------------------+---------+-------+--------+--------------------------+
| id | select_type | table         | type | possible_keys         | key              | key_len | ref   | rows   | Extra                    |
+----+-------------+---------------+------+-----------------------+------------------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | tsFileListTbl | ref  | path,tsNumDatePathIdx | tsNumDatePathIdx | 4       | const | 298080 | Using where; Using index |
+----+-------------+---------------+------+-----------------------+------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)

EDIT VII

Taking Rick James' suggestion, to avoid the wildcard search, we added a column partPath and changed app to support to support it.

CREATE TABLE `tsFileListTbl` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `tsNum` int(11) NOT NULL,
  `partPath` varchar(20) DEFAULT NULL,
  `path` varchar(128) NOT NULL DEFAULT '',
  `dateTime` char(15) DEFAULT '',
  `endDateTime` char(15) DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `path` (`path`),
  KEY `dateTimeIdx` (`dateTime`),
  KEY `tsNumDateIdx` (`tsNum`,`dateTime`),
  KEY `tsNumPathDateIdx` (`tsNum`,`path`,`dateTime`),
  KEY `partPathDateIdx` (`partPath`,`dateTime`)
) ENGINE=InnoDB AUTO_INCREMENT=2906868 DEFAULT CHARSET=utf8

Good news: search time is very very much faster.

SELECT path FROM tsFileListTbl IGNORE INDEX(dateTimeIdx, tsNumDateIdx, tsNumPathDateIdx) WHERE `tsNum` = 12 AND   `partPath` = "/mnt/das.b" ORDER BY `dateTime` ASC LIMIT 1;
+----------------------------------+
| path                             |
+----------------------------------+
| /mnt/das.b/12/20191013.061400.ts |
+----------------------------------+
1 row in set (0.00 sec)

explain SELECT path FROM tsFileListTbl IGNORE INDEX(dateTimeIdx, tsNumDateIdx, tsNumPathDateIdx) WHERE `tsNum` = 23 AND   `partPath` = "/mnt/das.b" ORDER BY `dateTime` ASC LIMIT 1;
+----+-------------+---------------+------+-----------------+-----------------+---------+-------+--------+-------------+
| id | select_type | table         | type | possible_keys   | key             | key_len | ref   | rows   | Extra       |
+----+-------------+---------------+------+-----------------+-----------------+---------+-------+--------+-------------+
|  1 | SIMPLE      | tsFileListTbl | ref  | partPathDateIdx | partPathDateIdx | 63      | const | 991200 | Using where |
+----+-------------+---------------+------+-----------------+-----------------+---------+-------+--------+-------------+

However the query still takes 2-3 seconds if there is no data. ie, no rows match the query. Why would that be? How can I get the same (fast) response regardless of whether there is data or not?

Best Answer

Add these two indexes:

INDEX(tsNum, path, datetime)
INDEX(tsNum, datetime, path)

If you want both the first and last, then use UNION ALL.

When showing a schema, please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.

Discussion

For

SELECT path FROM tsFileListTbl
    WHERE `tsNum` = 18 AND path LIKE "/mnt/das.h%"
    ORDER BY `dateTime` ASC LIMIT 1;

INDEX(tsNum, path, datetime)

Here is what is being done:

  • Since the index is "covering", all the work will be done in the index's BTree; no need to look in the data's BTree. EXPLAIN says Using index to indicate "covering".
  • It will drill down the BTree to find the first row with tsNum = 18 AND path LIKE "/mnt/das.h%". This is very fast.
  • Then it will scan forward in the BTree to collect all the rows matching that. In one of your tests, it seems there were 1.5M such rows. Of course, changing the 18 or the LIKE string will change this count, possibly a lot. This probably takes half the time of the query.
  • After that, sort on dateTime -- another half of the time
  • Finally, deliver the first row. Changing the value in the LIMIT won't change the timing much.

The cause of a performance problem can be characterized as "having a 'range' (LIKE '...%') and an unrelated ORDER BY"

Split the path

If path were split into base and rest such that the query were

SELECT base, rest FROM tsFileListTbl
    WHERE `tsNum` = 18
      AND base = "/mnt/das.h"   -- no range
    ORDER BY `dateTime` ASC LIMIT 1;

Then this can handle the entire query extremely fast, regardless of the number of rows with [18,"/mnt/das.h"]. It would need

INDEX(tsNum, base,  -- either order is OK
      datetime,     -- for the order by
      rest)         -- for covering (last)

The query would function thus:

  • Reach into the index BTree for the last row matching tsNum and base, where "last" is given by dateTime. The Handler numbers will be essentially "1".