Mysql – Optimizing a MySQL “IN” query

MySQLoptimizationperformancequery-performance

I'm trying to optimize a MySQL table for a query of the form:

SELECT Col1, Col2 FROM MyTable WHERE Col3 IN('Val1','Val2','Val3',...)

I've added an index to the table:

ALTER MyTable ADD INDEX(`COL3`)

and verified that it was created, but when I analyze the query using EXPLAIN the type is still "ALL" (indicating that all rows are retrieved and searched). What am I doing wrong?

Update:

I've tried EXPLAIN SELECT Col1, Col2 FROM MyTable WHERE Col3 = 'Val1'. Type is still "ALL". Here are the actual CREATE TABLE, ALTER TABLE and query:

CREATE TABLE RXNCONSO
(
   RXCUI             varchar(8) NOT NULL,
   LAT               varchar (3) DEFAULT 'ENG' NOT NULL,
   TS                varchar (1),
   LUI               varchar(8),
   STT               varchar (3),
   SUI               varchar (8),
   ISPREF            varchar (1),
   RXAUI             varchar(8) NOT NULL,
   SAUI              varchar (50),
   SCUI              varchar (50),
   SDUI              varchar (50),
   SAB               varchar (20) NOT NULL,
   TTY               varchar (20) NOT NULL,
   CODE              varchar (50) NOT NULL,
   STR               varchar (3000) NOT NULL,
   SRL               varchar (10),
   SUPPRESS          varchar (1),
   CVF               varchar(50)
)
;

ALTER TABLE `RXNCONSO` ADD INDEX(`TTY`);

EXPLAIN SELECT RXCUI, STR FROM RXNCONSO
  WHERE TTY IN ('SCD','SBD','GPCK','BPCK','SCDG','SBDG','SCDF','SBDF');

+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | RXNCONSO | ALL  | NULL          | NULL | NULL    | NULL | 1139206 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

Update 2:

Here's the result of running

SELECT TTY, count(1) FROM RXNCONSO
 WHERE TTY IN ('SCD','SBD','GPCK','BPCK','SCDG','SBDG','SCDF','SBDF')   
 GROUP BY TTY WITH ROLLUP;

+------+----------+
| TTY  | count(1) |
+------+----------+
| BPCK |      526 |
| GPCK |      492 |
| SBD  |    21732 |
| SBDF |    14593 |
| SBDG |    21140 |
| SCD  |    36131 |
| SCDF |    14556 |
| SCDG |    16362 |
| NULL |   125532 |
+------+----------+
9 rows in set (0.38 sec)

Best Answer

Since the query optimizer decided a full table scan was best, you might need a new query

SELECT Col1, Col2 FROM MyTable WHERE Col3 = 'Val1'
UNION
SELECT Col1, Col2 FROM MyTable WHERE Col3 = 'Val2'
UNION
SELECT Col1, Col2 FROM MyTable WHERE Col3 = 'Val3'
UNION ...

This should force an indexed lookup per value followed by a merge of the query results

UPDATE 2015-12-04 15:35 EDT

Based on jkavalik's comment, I have a suggestion. Please run this

SELECT COUNT(1)/20 INTO @FivePercent FROM RXNCONSO;
SELECT * FROM
(
    SELECT TTY,COUNT(1) ValueCount
    FROM RXNCONSO GROUP BY TTY
) A WHERE ValueCount > @FivePercent
ORDER BY ValueCount DESC;

This will show which values will likely contribute to a table scan.