table_1
id | column_1 | import_date
Indexes: Mutli-column key (multi_column
) of import_date
and column_1
in addition to an individual index for each column.
table_2
id | column_1 | column_2
Indexes: Multi-column key (multi_column
) for column_1
and column_2
in addition to an individual index for each column.
i.e. both tables have two composite indexes (of the same name in this question) as well as separate indexes.
Query
I've tried flipping the query around and playing with using/forcing different indexes but it didn't seem to help.
SELECT table_1.import_date, count(*) AS count
FROM table_1
FORCE INDEX (import_date)
JOIN table_2 USE INDEX (multi_column) ON table_1.column_1 = table_2.column_1 AND column_2 = "Some String"
WHERE table_1.import_date >= '2016-05-26'
GROUP BY table_1.import_date
LIMIT 365
I tried adding in the LIMIT
and a where clause to check where the date was greater than the same date last year just to try and prevent a large number of results and filter but neither seemed to improve the performance of the query, both tables have a large record set.
EXPLAIN
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE table_1 NULL range multi_column,import_date import_date 6 NULL 1932171 100.00 Using index condition; Using where
1 SIMPLE table_2 NULL ref multi_column multi_column 126 table_1.column_1,const 117 100.00 Using index
Sorry for obfuscated table and column names, trying to keep it simple.
Original Explain without force/use:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE table_1 NULL index multi_column,column_1,import_date import_date 69 NULL 365 100.00 Using where; Using index
1 SIMPLE table_2 NULL ref multi_column,column_1,column_2 multi_column 126 table_1.column_1,const 117 100.00 Using index
Row counts for each table are 3,937,236 and 4,503,551 respectively.
The query currently takes around 80s, how long should it take, I'm not sure but I imagined with indexes then hopefully under a second.
Running locally on my development machine just now, Mac OS X El Capitan, 2.2 GHz i7 16gb ram.
MySQL version 5.7.15
Both tables are using InnoDB.
This query seems to take it down to roughly 20 seconds:
select import_date, count(*) as count
from table_1
where exists (select * from table_2 where table_1.column_1 = table_2.column_1 and column_2 = 'Some String')
group by import_date
Best Answer
How about
FORCE INDEX (import_date)
toFORCE INDEX (multi_column)
?