I have a query which includes a join and I have an index on the tables:
SELECT e.entry, e.user_id from entries e left join users u on
e.user_id=u.user_id where e.tenant_id=? AND u.active=1 AND e.valid=1
The user table has ~116k rows and the entries table has 165k rows and the above query takes ~60 seconds to run. Server is 8 core, 32GB RAM with SSD disks so I don't imagine this is normal.
I have ran explain against the query and it comes back with:
+----+-------------+-------+-------+---------------+------------+---------+-------+--------+-----------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+-------+--------+-----------------------------------------------------------------+
| 1 | SIMPLE | e | ref | tenant_id | tenant_id | 4 | const | 3589 | Using where |
| 1 | SIMPLE | u | index | NULL | tenant_id | 314 | NULL | 116710 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+------------+---------+-------+--------+-----------------------------------------------------------------+
The index on users is:
+---------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| users | 0 | PRIMARY | 1 | id | A | 116710 | NULL | NULL | | BTREE | | |
| users | 1 | tenant_id | 1 | tenant_id | A | 19451 | NULL | NULL | | BTREE | | |
| users | 1 | tenant_id | 2 | user_id | A | 116710 | NULL | NULL | | BTREE | | |
| users | 1 | tenant_id | 3 | group_id | A | 116710 | NULL | NULL | | BTREE | | |
| users | 1 | tenant_id | 4 | valid | A | 116710 | NULL | NULL | | BTREE | | |
+---------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
I struggle to get my head around indexes especially with joins but it looks like the index on users is not being used on the query as it is scanning all 116k rows – is that correct?
If so, how can I setup the index to improve the performance and not have to scan all 116k rows?
Best Answer
entries
needsINDEX(valid, tenent_id)
users
needsINDEX(active, user_id)
More discussion (Index Cookbook).
More specifics
LEFT
probably prevents the optimizer from picking which table to start with. Assuming theLEFT
is important, let's break the query in two:The link above explains that the optimal index starts with anything with '=', so
(tenant_id, valid)
in any order. (I suggested one of them.)Then it moves on to the other table:
Hence,
(active, user_id)
in any order.(Sorry, I need to augment the Cookbook with more discussion of
JOIN
.)If you could get rid of
LEFT
, then a different set of indexes might be useful:Having both pairs of indexes would let the optimizer pick which is better. However, some consolidation is desirable (again, see the link). These two would suffice for either table order. Now the column order is important:
More on your title question of "not sure if index is being used". In the
EXPLAIN
,NULL
and116710
imply that it is not. Butindex
andUsing index
imply that it is. The tie-breaker isUsing join buffer (Block Nested Loop)
which says that it is loading the index into RAM to use it. (This is a case wherejoin_buffer_size
matters.)The 4-column index you had was somewhat useful, but it
active
) to be coveringgroup_id
)valid
was not as useful as it could be.