Mysql – Does using the in operator affect the use of indexes

execution-planindexMySQL

Does using the in operator affect the use of indexes? For example, is it possible for WHERE id IN (10) to cause the optimizer to ignore an index?

Specifically, I'm trying to understand this answer from StackOverflow. The question asks why an index is not being utilized (in MySQL), and the answer suggests that it could be due to the use of in with only one value. Eg WHERE id IN (10) might get the index ignored, while WHERE id=10 would be fine.

I've briefly looked around on Google, and skimmed through some MySQL docs, but I can't find any reference to in affecting the optimizer's decision to use an index. Neither with a single value, nor multiple values.

Assuming that the IN values are the same datatype as the column they are comparing, can they affect the index usage?

The question that I linked to was for MySQL, but I work in other DBs, so I'd be interested in knowing if this is a universal thing to note about indexes, or if it's a quirk with MySQL.

Best Answer

If that happens in MySQL then it's a quirk to MySQL. Since you asked about other databases as well I decided to test it out on my postgres database. On a table with a little over 70 million rows it produces exactly the same explain for the following two queries (with the index):

explain select * from myschema.my_list where my_column in (232);
explain select * from myschema.my_list where my_column = 232;

which is:

'Bitmap Heap Scan on my_list  (cost=68093.11..4109456.90 rows=1656844 width=994)'
'  Recheck Cond: (my_column = 232)'
'  ->  Bitmap Index Scan on my_list_2  (cost=0.00..67678.90 rows=1656844 width=0)'
'        Index Cond: (my_column = 232)'

It doesn't diverge until I add in multiple elements in the list:

 explain select * from myschema.my_list where my_column in (232,79);
 explain select * from myschema.my_list where my_column = 232 or my_column = 79;

which is (respectively):

'Bitmap Heap Scan on my_list  (cost=172208.28..7050980.15 rows=4290987 width=994)'
'  Recheck Cond: (my_column = ANY ('{232,79}'::integer[]))'
'  ->  Bitmap Index Scan on my_list_2  (cost=0.00..171135.53 rows=4290987 width=0)'
'        Index Cond: (my_column = ANY ('{232,79}'::integer[]))'

'Bitmap Heap Scan on my_list  (cost=177390.73..7066890.07 rows=4230402 width=994)'
'  Recheck Cond: ((my_column = 232) OR (my_column = 79))'
'  ->  BitmapOr  (cost=177390.73..177390.73 rows=4290987 width=0)'
'        ->  Bitmap Index Scan on my_list_2  (cost=0.00..67678.90 rows=1656844 width=0)'
'              Index Cond: (my_column = 232)'
'        ->  Bitmap Index Scan on my_list_2  (cost=0.00..107596.63 rows=2634142 width=0)'
'              Index Cond: (my_column = 79)'

I would take his answer with a grain of salt.