I am trying to retrieve chunks of records from my table via a generated SELECT WHERE IN
statement. If the statement is small, it will use the index, if the statement is large (1000+ IN items) it performs a full table scan. The table is 17 million rows long, so this gets expensive when I only want to look for 10k records.
I have a unique index covering (in this order) date
, world
, player
, city_id
. I also have an index for the date and player. The table is also partitioned by world.
Example query:
SELECT * FROM mytable
WHERE
(date, world, player, city_id)
IN (
('2017-05-30',4,'player',15466761),
('2017-05-30',4,'player',567599046),
('2017-05-30',4,'player2',34567866),
('2017-05-30',4,'player',2342467),
('2017-05-30',4,'player3',92341268),
... etc
)
If the query is small, it will use the index, if the query is large it performs a full table scan. Even adding FORCE INDEX (unique_index)
doesn't do anything for the large queries.
Execution plan output:
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "20928166.60"
},
"table": {
"table_name": "city_data",
"partitions": [
"w3",
"w4",
"w5",
"w6",
"w7"
],
"access_type": "ALL",
"possible_keys": [
"unique_index"
],
"rows_examined_per_scan": 17440138,
"rows_produced_per_join": 8720069,
"filtered": "50.00",
"cost_info": {
"read_cost": "19184152.80",
"eval_cost": "1744013.80",
"prefix_cost": "20928166.60",
"data_read_per_join": "3G"
}
}
SHOW CREATE TABLE
for city_data
:
CREATE TABLE `city_data` (
`date` date NOT NULL,
`world` int(3) NOT NULL,
`player` varchar(45) NOT NULL,
`alliance` varchar(45) NOT NULL,
`score` int(11) NOT NULL,
`continent` int(11) NOT NULL,
`castle` tinyint(1) NOT NULL,
`water` tinyint(1) NOT NULL,
`temple` tinyint(1) NOT NULL,
`x_coord` int(11) NOT NULL,
`y_coord` int(11) NOT NULL,
`city_name` varchar(45) NOT NULL,
`city_id` int(20) NOT NULL,
`entry_id` int(55) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`entry_id`,`world`),
UNIQUE KEY `unique_index` (`date`,`world`,`player`,`city_id`) USING BTREE,
KEY `world_index` (`world`) USING BTREE,
KEY `date_index` (`date`) USING BTREE,
KEY `player_index` (`player`(8))
) ENGINE=MyISAM AUTO_INCREMENT=17999069 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY LIST (`world`)
(PARTITION w3 VALUES IN (3) ENGINE = MyISAM,
PARTITION w4 VALUES IN (4) ENGINE = MyISAM,
PARTITION w5 VALUES IN (5) ENGINE = MyISAM,
PARTITION w6 VALUES IN (6) ENGINE = MyISAM,
PARTITION w7 VALUES IN (7) ENGINE = MyISAM) */
How can I get it to use the index instead of relying on a full table scan?
I have read this post: https://stackoverflow.com/questions/586381/mysql-not-using-indexes-with-where-in-clause
Best Answer
I think there is an issue with large
IN
clauses andPARITIONing
-- Beyond some not-so-large-number, the Optimizer say "punt, I'll just send this query to every partition". (This may be part of what you are seeing.)But more than that. Picking partitions is costly, not cheap. If you end up with all the partitions, the query will be slower than if you did not have partitioning. (I recommend removing partitioning. I have found no use case where
BY LIST
helps performance.)Most importantly... "Tuples" are not optimized. That is, the Optimizer does a terrible job with
WHERE (a,b) IN ((1,2), (3,4))
, regardless of the indexes. Don't use it if you want performance. (Or you could plead for the optimization feature at bugs.mysql.com).) (This is the real answer to the question.)So, what to do? One approach is to create a temp table with all those tuples. Then
JOIN
from that tmp table to the real table and have a composite index (as you have withunique_index
) to make the lookups fast.Some other notes.
INT(55)
means nothing.INT
is always 4 bytes, with a range up to a few billion.world int(3)
should probably beTINYINT UNSIGNED
(0..255) orSMALLINT UNSIGNED
(0..65535).KEY player_index (player(8)
); remove the(8)
.KEY date_index (date)
is redundant since there is another index (unique_index)
starting with the same column.