This is a minimal example of my problem, but I reproduced the problem with this structure. I have the following table:
| column name | type |
|----------------+-------------|
| id | PRIMARY KEY |
| id_start | INT |
| id_end | INT |
|----------------+-------------|
This table contains roughly one billion (1,000,000,000) rows. The _start
and _end
columns are indexed. This is InnoDB using Btree indices.
Some background information: This is sensor data with start and end timestamps. The timestamps itself are kept in another table, which is referenced from this one. Thus I only work with the id
columns here.
Following query returns around 15,000 rows:
SELECT * FROM table WHERE 310000 BETWEEN id_start AND id_end;
The problem is that query takes very long to return anything (MySQL Workbench duration 1 sec / fetch 220 sec).
If I append LIMIT 14000
to the query it returns almost immediately. The closer I go with the LIMIT
parameter to the expected 15000 resulting rows, the longer the query takes.
What could be the cause, that fetching 14,000 rows vs. 15,000 rows takes exponentially longer?
Some more bits of information which I am unsure of if they are of any relevance to this question:
- The DB Server is well configured (InnoDB buffers etc.).
- The DB Server has way more CPU power and memory than required. The whole dataset fits into memory.
- All columns in that table are not nullable.
As requested here is additional information regarding the table structure and the query.
CREATE TABLE `table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_start` int(11) NOT NULL,
`id_end` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_end` (`id_end`),
KEY `idx_start` (`id_start`),
KEY `idx_start_end` (`id_start`,`id_end`),
KEY `idx_end_start` (`id_end`,`id_start`)
) ENGINE=InnoDB;
And here the EXPLAIN statement for the query:
+----+-------------+-------+------------+-------+-------------------------------------------------+---------------+---------+------+----------+----------+---------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-------------------------------------------------+---------------+---------+------+----------+----------+---------------+
| 1 | SIMPLE | table | NULL | range | "idx_start,idx_end,idx_start_end,idx_end_start" | idx_end_start | 9 | NULL | 20509120 | 50.00 | "Using where" |
+----+-------------+-------+------------+-------+-------------------------------------------------+---------------+---------+------+----------+----------+---------------+
Here is the result of a low LIMIT
EXPLAIN
:
'{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "27795648.61"
},
"table": {
"table_name": "table",
"access_type": "range",
"possible_keys": [
"idx_start",
"idx_end",
"idx_start_end",
"idx_end_start"
],
"key": "idx_end_start",
"used_key_parts": [
"id_end"
],
"key_length": "9",
"rows_examined_per_scan": 19854034,
"rows_produced_per_join": 9927018,
"filtered": "50.00",
"index_condition": "(310300 between `table`.`id_start` and `table`.`id_end`)",
"cost_info": {
"read_cost": "25810244.97",
"eval_cost": "1985403.64",
"prefix_cost": "27795648.61",
"data_read_per_join": "4G"
},
"used_columns": [
"id",
"id_start",
"id_end"
]
}
}
}'
and here is the result of a high LIMIT
EXPLAIN
:
'{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "27203639.01"
},
"table": {
"table_name": "table",
"access_type": "range",
"possible_keys": [
"idx_start",
"idx_end",
"idx_start_end",
"idx_end_start"
],
"key": "idx_end_start",
"used_key_parts": [
"id_end"
],
"key_length": "9",
"rows_examined_per_scan": 19431170,
"rows_produced_per_join": 9715586,
"filtered": "50.00",
"index_condition": "(310300 between `table`.`id_start` and `table`.`id_end`)",
"cost_info": {
"read_cost": "25260521.78",
"eval_cost": "1943117.23",
"prefix_cost": "27203639.01",
"data_read_per_join": "4G"
},
"used_columns": [
"id",
"id_start",
"id_end"
]
}
}
}'
Best Answer
I see two questions:
LIMIT
affecting performance?This is a classic "tough task" in MySQL -- finding elements between a 'start' and 'end' value.
I address it with some extra complication in http://mysql.rjweb.org/doc.php/ipranges . However, it requires that the ranges be non-overlapping. (It is unclear whether your ranges will overlap.)
You say that
id_start
is really a pointer to another table where the actual timestamp is held? And yet you useWHERE 310000 BETWEEN id_start AND id_end
. This implies thatid_start
andid_end
are chronologically ordered, though not actually representations of "time"? And ditto for310000
? Perhaps this is to use 4-byte INTs instead of largerTIMESTAMP(6)
or something?With
EXPLAIN FORMAT=JSON SELECT ...
, we might be able to figure out why changing theLIMIT
value affects performance.How evenly are the times distributed? Here's a guess at the difference you are seeing: With a low value for
LIMIT
, the Optimizer's statistics suggest that a certain query plan will find the 14K quickly, but with the larger value, it will deduce that it may need to search far to find 15K rows. The Optimizer has no concept of "time ranges". Hence it can only useINDEX(start)
orINDEX(end)
; it does not know thatstart<end
and other things that you know and could use to optimize the query.Do you need
id
for anything else? Are there other columns in the table? Both of these are important questions, since the following optimization may fail, depending on your answers.For this 3-column table, I would consider one of the following alternatives:
Notes:
start_id
may help with performance.