MySQL – Optimizing Slow Subquery Joins with GROUP BY

group byjoin;MySQLsubquery

Can anyone help me with the following query?

SELECT
  voc.username AS Medewerker,
  voc.bestvastonbep_count AS Doorgezet,
  IFNULL(cc.call_count, 0) AS Telefoongesprekken,
  voc.bestvastonbep_count / IFNULL(cc.call_count, 0) AS Ratio
FROM (
  SELECT
    u.id AS user_id,
    CONCAT(u.firstname, ' ', u.lastname) AS username,
    COUNT(vo.id) AS bestvastonbep_count
  FROM `bestvastonbep`       vo
  JOIN `best`                b   ON b.id = vo.best_id
  JOIN `vastonbepbeststatus` vos ON b.id = vos.best_id
  JOIN `user`                u   ON u.id = b.user_id
  WHERE vos.type = 2
  AND   vos.status = 2
  AND  (b.date BETWEEN '2017-02-04' AND '2017-05-04')
  GROUP BY u.id
) voc
LEFT JOIN (
  SELECT
    user_id,
    COUNT(id) AS call_count
  FROM `call` c
  WHERE  (c.start BETWEEN '2017-02-04' AND '2017-05-04')
  AND c.duration > 0
  AND c.calltype = 'out'
  GROUP BY c.user_id
) cc
ON voc.user_id = cc.user_id;

It produces the desired result, but it's immensely slow.
I've read questions from other people about similar queries. For them it turned out that joining subqueries wasn't necessary.
The difference for me is that I'm using GROUP for both subqueries.
Hopefully there's a way and I'm just not seeing it.

Edit in response to RDFozz

The EXPLAIN query gives the following result.
enter image description here

The first subquery is very fast. (0.1 s)
The second subquery takes about 6 seconds.
The complete query takes about 57 seconds.

There is an index on call.user_id as well as call.start.
But the call table currently has about 32 million rows.

Answer
I took Plan A as described by Rick James.
The query now looks like the following:

SELECT
    voc.username AS Medewerker,
    voc.bestvastonbep_count AS Doorgezet,
    IFNULL(count(c.user_id), 0) AS Telefoongesprekken,
    IFNULL(voc.bestvastonbep_count / count(c.user_id), "∞") AS Ratio
FROM (
    SELECT u.id AS user_id, CONCAT(u.firstname, ' ', u.lastname) as username, COUNT(vo.id) AS bestvastonbep_count
    FROM `bestvastonbep`       vo
    JOIN `best`                b   ON b.id = vo.best_id
    JOIN `vastonbepbeststatus` vos ON b.id = vos.best_id
    JOIN `user`                u   ON u.id = b.user_id
    WHERE vos.type = 2
    AND   vos.status = 2
    AND  (b.date BETWEEN '2017-02-04' AND '2017-05-04')
    GROUP BY u.id
) voc 
LEFT JOIN `call` c on
    (c.start BETWEEN '2017-02-04' AND '2017-05-04')
    AND c.duration > 0
    AND c.calltype = 'out'
    AND voc.user_id = c.user_id
GROUP BY voc.user_id;

It takes about 7 seconds, depending on the size. Faster would be nice, but this is just about acceptable for my purposes.

Best Answer

Explode-Implode

In the first derived table (voc) you have what call the "explode-implode" syndrone. First the JOINs generate a lot of rows, then the GROUP BY whittles it back to a few rows.

Sometimes the solution is to turn a JOIN into a subquery:

SELECT  ...,
        b.foo
    FROM a
    LEFT JOIN b
    GROUP BY a.id

-->

SELECT  ...,
        ( SELECT foo FROM b WHERE ... = a... ) AS foo
    FROM a

Note that the GROUP BY goes away. This example works especially well for LEFT JOIN because the "optional" nature of foo is preserved.

(The GROUP BY in cc is not the same situation.)

Derived JOIN Derived

MySQL is terrible (pre-5.6) or poor (5.6+) at dealing with JOINing two subqueries (voc and cc).

Plan A: Turn one of those subqueries into a JOIN with the other.

Plan B: Put one of the subqueries into a TEMPORARY TABLE and give it an appropriate index: (user_id).

Indexes

In the absence of SHOW CREATE TABLE, must do some guessing...

vos: INDEX(type, status, best_id)
b:   INDEX(date)
c:   INDEX(calltype, start, duration, user_id)

And change COUNT(id) to COUNT(*).

Note: start BETWEEN '2017-02-04' AND '2017-05-04' is "inclusive". Consider changing to

    start >= '2017-02-04'
AND start  < '2017-02-04' + INTERVAL 3 MONTH

Please provide EXPLAIN SELECT ... after adding those indexes.

I see a possible divide-by-zero; maybe you want

IFNULL(voc.bestvastonbep_count / cc.call_count, 0)