MySQL – Why SELECT Query is Slow Despite Indexing

configurationMySQLselect

I have a large MyISAM table A (500m entries) that connects two other tables B,C with a n:m relation.

So my table A looks like this:

CREATE TABLE `A` (
  `id_b` int(10) DEFAULT NULL,
  `id_c` int(12) DEFAULT NULL,
  UNIQUE KEY `Uniqueness` (`id_c`,`id_b`),
  KEY `id_b` (`id_b`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Every entity in B is related to 1-12 entities in C with an average of 8 and a mode of 10.

Every entitiy in C is related to 100-500 entities in B (but this is linearly growing over time).

Now for a specific entitiy c I want a list of all other entities in C that it is connected to through B and how many entities in B they are connect through.
Or more directly: I have a specific c, that is connected 100-500 b's which are each connected to a handful of other entities in C. I want to know which C's the b's are connect with, and how often.

My simplified query looks like this:

SELECT COUNT(OtherA.id_b), OtherA.id_c FROM A
INNER JOIN A OtherA
ON OtherA.id_b = A.id_b
WHERE A.id_c=[MY_SPECIFIC_C_ID]
GROUP BY OtherA.id_c

And takes over a second to execute. As described above, the inner join yields at max a few thousand results and the group reduces it to a few hundred. As far as I can tell I indexed correctly, so I am confused why this query is taking so long.

EXPLAIN yields the following result:

row1
id--------------1
select_type-----SIMPLE
table-----------A
type------------ref
possible_keys---Uniqueness,id_b
key-------------Uniqueness
key_len---------5
ref-------------const
rows------------462
Extra-----------Using where; Using index; Using temporary; Using filesort
row2
id--------------1
select_type-----SIMPLE
table-----------OtherA
type------------ref
possible_keys---id_b
key-------------id_b
key_len---------5
ref-------------THISDB.A.id_b
rows------------8
Extra-----------Using where

As suggested I have tried manually extracting the list of id_b's:

[concated_list] = select group_concat(distinct id_b) as id_list from A where id_c = ...;

And then running

SELECT count(id_b), id_c FROM A WHERE id_b IN([concated_list]) GROUP BY id_c

The first query is instant, the second query takes almost as long as the original query, so it was suggested that I need to fine-tune my MySQL Vars. Where do I start?

EXPLAIN SELECT count(id_b), id_c FROM A WHERE id_b IN([concated_list]) GROUP BY id_c

results in:

id--------------1
select_type-----SIMPLE
table-----------A
type------------range
possible_keys---id_b
key-------------id_b
key_len---------5
ref-------------NULL
rows------------802
Extra-----------Using where; Using temporary; Using filesort

MySQL version is 5.5.35-0ubuntu0.12.04.2.

Best Answer

EDIT: Since the first try with a subselect lead into doom, it might be necessary to split the table into a hot range and a "cold" range. That solution only could help if you have some criteria for id_c and id_b, which shows if this id_c will be read often or rarely, for example the age of this. But even deleting from such a big table is not so easy, it would need an optimize which takes ages. It would need some clever ideas to get as little as possible reads which "overlap" between both tables, but since you want to read all rows with id_b from different id_c, there will be cases where you need two queries and adding afterwards.

As already stated in the comments, look into key_reads and key_reads_request with SHOW VARIABLES LIKE 'KEY%';. Check if key_buffer_size is big enough (https://dev.mysql.com/doc/refman/5.0/en/myisam-key-cache.html). You could arrange two different key buffers, one only for that specific table (https://dev.mysql.com/doc/refman/5.0/en/multiple-key-caches.html), one for the rest.

The suggestion with an index id_b, id_c should help a lot, since your query can then be processed completely in index, not from the (not bigger) "real table".

Try doing an analyze table A in low-traffic-times.

It really would help to split the table into many sections, especially for OPTIMIZE TABLE and ANALYZE TABLE, but your kind of query makes that really hard.

Doomed original solution: I think you create many N x N - block of rows, where N is the number of rows in tableA with identical id_b values, since the join has to combine the id_b - blocks. So when you have many rows for some id_b, time squares up with that number.

It might help to use subselect in that case.

EDIT: EDIT undone.

SELECT COUNT(*), OtherA.id_c 
FROM A OtherA where OtherA.id_b in (
    select DISTINCT id_b from A where A.id_c = [MY_SPECIFIC_C_ID]
)
GROUP BY OtherA.id_c

So the id_b can be selected in one scan and then can be sent through the index to find the right rows.

EDIT: The Explain inserted by the Questioner shows that this subselect is a very bad idea in that case, no wonder it takes forever:

EXPLAIN on the suggested query looks like this:

row1
id--------------1
select_type-----PRIMARY
table-----------OtherA
type------------index
possible_keys---NULL
key-------------Uniqueness
key_len---------10
ref-------------NULL
rows------------496167246
Extra-----------Using where; Using index
row2
id--------------2
select_type-----DEPENDENT SUBQUERY
table-----------A
type------------reg
possible_keys---Uniqueness, id_b
key-------------Uniqueness
key_len---------10
ref-------------const,func
rows------------1
Extra-----------Using where; Using index; Using temporary