MySQL – Why Bulk Multi-Column Key Queries Are Slow

indexindex-tuningMySQLoptimizationperformancequery-performance

(For this question, I am using AWS/Aurora MySQL with a reasonably-spec'd RDS instance)

Consider the following schema:

Table T:
    col0: the usual autoincrement primary key
    col1: varchar
    col2: varchar
    col3: varchar
    col4...N: various data

Consider that there is a unique index on:

<col1, col2, col3>

And a non-unique index on:

<col1, col2>

And consider the following query:

SELECT * FROM T
WHERE
    (col1 = 'val1' AND col2 = 'id1') OR
    (col1 = 'val2' AND col2 = 'id2') OR
    ...
    (col1 = 'valN' AND col2 = 'idN');

I would (perhaps naively) expected MySQL to figure out that each element of the OR set matched the (non-unique) index, and performed the query in the way it would have if I had said:

WHERE col0 in (v1, v2, ... , vN)

But it doesn't seem to do that: the timing for these two queries is WAY OFF, on the order of 10x slower for the "or of ands" query. EVEN WITH the secondary key lookup, and the fact that it's a string column lookup, 10x seems a bit severe. Note that EXPLAIN claims to be using the correct/expected index whether I specify (col1, col2) or (col1, col2, col3)

Please note also that:

SELECT * from T
WHERE
    col1 in (list1)
AND
    col2 in (list2);

Is also slow when there are a lot of different values in list1 and list2. Doing an "and" for the three columns is almost intractably slow.

Perhaps not surprisingly, this query works better than the "or of ands" when list1 is of length 1.

Best Answer

With "row constructors", you might get an optimization:

WHERE (col1, col2) IN (('v1', 'id1'), ('v2', 'id2'), ...)

But... In old versions, that would work, but lead to a table scan. I can't say specifically about the version you are running.

When you have this pair of indexes:

UNIQUE(col1, col2, col3)  -- (or plain INDEX)
INDEX(col1, col2)

there is no need for the latter, since the former can handle any queries that need it.

Perhaps the optimal way to write your query is

WHERE col1 in ('v1', 'v2', ...)
  AND (col1, col2) IN (('v1', 'id1'), ('v2', 'id2'), ...)

With that, it will use any index starting with col1 as a crude filter, then use the other part for the rest of the filtering.

Re "convert to an in method" -- MySQL started out as a clean and mean database; it did most of what anyone needed and did it reasonably well. That was 90% of the development. We are now into the other 90% of the development -- the "long tail". Quite possibly some list somewhere includes "convert to an in method". If so, it is being prioritized along with the thousands of other rare and obscure optimizations. Feel free to file a 'feature request' at bugs.mysql.com; that is the way to add it to the list, or bump it up in priority.