Mysql – Very slow query, not sure if MySQL index is being used

indexMySQLmysql-5.6performancequery-performance

I have a query which includes a join and I have an index on the tables:

SELECT e.entry, e.user_id from entries e left join users u on 
e.user_id=u.user_id where e.tenant_id=? AND u.active=1 AND e.valid=1

The user table has ~116k rows and the entries table has 165k rows and the above query takes ~60 seconds to run. Server is 8 core, 32GB RAM with SSD disks so I don't imagine this is normal.

I have ran explain against the query and it comes back with:

+----+-------------+-------+-------+---------------+------------+---------+-------+--------+-----------------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref   | rows   | Extra                                                           |
+----+-------------+-------+-------+---------------+------------+---------+-------+--------+-----------------------------------------------------------------+
|  1 | SIMPLE      | e     | ref   | tenant_id     | tenant_id  | 4       | const |   3589 | Using where                                                     |
|  1 | SIMPLE      | u     | index | NULL          | tenant_id  | 314     | NULL  | 116710 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+------------+---------+-------+--------+-----------------------------------------------------------------+

The index on users is:

+---------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table         | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| users         |          0 | PRIMARY    |            1 | id          | A         |      116710 |     NULL | NULL   |      | BTREE      |         |               |
| users         |          1 | tenant_id  |            1 | tenant_id   | A         |       19451 |     NULL | NULL   |      | BTREE      |         |               |
| users         |          1 | tenant_id  |            2 | user_id     | A         |      116710 |     NULL | NULL   |      | BTREE      |         |               |
| users         |          1 | tenant_id  |            3 | group_id    | A         |      116710 |     NULL | NULL   |      | BTREE      |         |               |
| users         |          1 | tenant_id  |            4 | valid       | A         |      116710 |     NULL | NULL   |      | BTREE      |         |               |
+---------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I struggle to get my head around indexes especially with joins but it looks like the index on users is not being used on the query as it is scanning all 116k rows – is that correct?

If so, how can I setup the index to improve the performance and not have to scan all 116k rows?

Best Answer

entries needs INDEX(valid, tenent_id)
users needs INDEX(active, user_id)

More discussion (Index Cookbook).

More specifics

LEFT probably prevents the optimizer from picking which table to start with. Assuming the LEFT is important, let's break the query in two:

FROM entries WHERE e.tenant_id=? AND e.valid=1

The link above explains that the optimal index starts with anything with '=', so (tenant_id, valid) in any order. (I suggested one of them.)

Then it moves on to the other table:

LEFT JOIN users u
    ON u.user_id = ...
WHERE u.active=1

Hence, (active, user_id) in any order.

(Sorry, I need to augment the Cookbook with more discussion of JOIN.)

If you could get rid of LEFT, then a different set of indexes might be useful:

(active)
(valid, user_id, tenant_id) -- in any order

Having both pairs of indexes would let the optimizer pick which is better. However, some consolidation is desirable (again, see the link). These two would suffice for either table order. Now the column order is important:

(active, user_id)   -- also handles (active)
(valid, tenant_id, user_id)  -- also handles (valid, tenent_id)

More on your title question of "not sure if index is being used". In the EXPLAIN, NULL and 116710 imply that it is not. But index and Using index imply that it is. The tie-breaker is Using join buffer (Block Nested Loop) which says that it is loading the index into RAM to use it. (This is a case where join_buffer_size matters.)

The 4-column index you had was somewhat useful, but it

  • did not have all the needed columns (active) to be covering
  • had an extra column (group_id)
  • was in a sub-optimal order: the extra column came before the all the needed columns, so valid was not as useful as it could be.