MySQL – Force Index Not Working on AWS MySQL

amazon-rds-auroraexecution-planindexMySQLoptimization

I have a table sales with a compound index on (user_id, eastern_date, state)

CREATE TABLE sales (
    id int not null auto_increment,
    user_id int not null,
    eastern_date date not null,
    state varchar not null,
    sales_count int not null,
    sales_amount float not null,
    UNIQUE KEY `user_date_state` (`user_id`, `eastern_date`, `state`)
    ...
)

Total number of rows is ~ 60MM

I then explain a query on it for 3400 records

explain
select * from sales 
where (`user_id`, `eastern_date`, `state`) IN (
    (1, "2022-01-01", "NY"),
    ... 3399 more tuples
)

And it shows me that it uses the index user_date_state

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE sales "pmin p2020 p2021 p2022 p2023 p2024 p2025 p2026 pmax" range user_date_state user_date_state 309 NULL 3400 100.00 "Using where"

I then explain a query on it for 3401 rows (1 more match) and it no longer uses the index.

explain
select * from sales 
where (`user_id`, `eastern_date`, `state`) IN (
    (1, "2022-01-01", "NY"),
    ... 3400 more tuples (1 more match)
)
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE sales "pmin p2020 p2021 p2022 p2023 p2024 p2025 p2026 pmax" range user_date_state NULL NULL NULL 60000000 50.00 "Using where"

I then include a FORCE INDEX clause

explain
select * from sales 
FORCE INDEX (user_date_state)
where (`user_id`, `eastern_date`, `state`) IN (
    (1, "2022-01-01", "NY"),
    ... 3400 more tuples (1 more match)
)

but it still does not use the index which seems to indicate that force index did not change anything. What am I missing? (Thank you)

I am using version 5.7.mysql_aurora.2.09.1 and the table is an INNODB table.

Best Answer

Row constructors such as

(a,b,c)) IN ((1,2,3), (5,6,7))

were not efficiently acted on.

As a workaround, I suggest you insert those 3400+ rows into a temp table with the same 3 column names, then do

JOIN temp_table
    USING (`user_id`, `eastern_date`, `state`)

Rumor has it that row-constructors are being improved (someday).

As a side note, consider getting rid of id and promoting that Unique key to be the Primary Key.

I suspect (without any concrete evidence) that the existence of Partitioning is adding to the sluggishness.

One more thing to look at: Check that the collation on state is the same as that of the connection SHOW VARIABLES LIKE 'coll%';