Can anyone help me with the following query?
SELECT
voc.username AS Medewerker,
voc.bestvastonbep_count AS Doorgezet,
IFNULL(cc.call_count, 0) AS Telefoongesprekken,
voc.bestvastonbep_count / IFNULL(cc.call_count, 0) AS Ratio
FROM (
SELECT
u.id AS user_id,
CONCAT(u.firstname, ' ', u.lastname) AS username,
COUNT(vo.id) AS bestvastonbep_count
FROM `bestvastonbep` vo
JOIN `best` b ON b.id = vo.best_id
JOIN `vastonbepbeststatus` vos ON b.id = vos.best_id
JOIN `user` u ON u.id = b.user_id
WHERE vos.type = 2
AND vos.status = 2
AND (b.date BETWEEN '2017-02-04' AND '2017-05-04')
GROUP BY u.id
) voc
LEFT JOIN (
SELECT
user_id,
COUNT(id) AS call_count
FROM `call` c
WHERE (c.start BETWEEN '2017-02-04' AND '2017-05-04')
AND c.duration > 0
AND c.calltype = 'out'
GROUP BY c.user_id
) cc
ON voc.user_id = cc.user_id;
It produces the desired result, but it's immensely slow.
I've read questions from other people about similar queries. For them it turned out that joining subqueries wasn't necessary.
The difference for me is that I'm using GROUP
for both subqueries.
Hopefully there's a way and I'm just not seeing it.
Edit in response to RDFozz
The EXPLAIN query gives the following result.
The first subquery is very fast. (0.1 s)
The second subquery takes about 6 seconds.
The complete query takes about 57 seconds.
There is an index on call.user_id
as well as call.start
.
But the call table currently has about 32 million rows.
Answer
I took Plan A as described by Rick James.
The query now looks like the following:
SELECT
voc.username AS Medewerker,
voc.bestvastonbep_count AS Doorgezet,
IFNULL(count(c.user_id), 0) AS Telefoongesprekken,
IFNULL(voc.bestvastonbep_count / count(c.user_id), "∞") AS Ratio
FROM (
SELECT u.id AS user_id, CONCAT(u.firstname, ' ', u.lastname) as username, COUNT(vo.id) AS bestvastonbep_count
FROM `bestvastonbep` vo
JOIN `best` b ON b.id = vo.best_id
JOIN `vastonbepbeststatus` vos ON b.id = vos.best_id
JOIN `user` u ON u.id = b.user_id
WHERE vos.type = 2
AND vos.status = 2
AND (b.date BETWEEN '2017-02-04' AND '2017-05-04')
GROUP BY u.id
) voc
LEFT JOIN `call` c on
(c.start BETWEEN '2017-02-04' AND '2017-05-04')
AND c.duration > 0
AND c.calltype = 'out'
AND voc.user_id = c.user_id
GROUP BY voc.user_id;
It takes about 7 seconds, depending on the size. Faster would be nice, but this is just about acceptable for my purposes.
Best Answer
Explode-Implode
In the first derived table (
voc
) you have what call the "explode-implode" syndrone. First theJOINs
generate a lot of rows, then theGROUP BY
whittles it back to a few rows.Sometimes the solution is to turn a
JOIN
into a subquery:-->
Note that the
GROUP BY
goes away. This example works especially well forLEFT JOIN
because the "optional" nature offoo
is preserved.(The
GROUP BY
incc
is not the same situation.)Derived JOIN Derived
MySQL is terrible (pre-5.6) or poor (5.6+) at dealing with
JOINing
two subqueries (voc
andcc
).Plan A: Turn one of those subqueries into a
JOIN
with the other.Plan B: Put one of the subqueries into a
TEMPORARY TABLE
and give it an appropriate index:(user_id)
.Indexes
In the absence of
SHOW CREATE TABLE
, must do some guessing...And change
COUNT(id)
toCOUNT(*)
.Note:
start BETWEEN '2017-02-04' AND '2017-05-04'
is "inclusive". Consider changing toPlease provide
EXPLAIN SELECT ...
after adding those indexes.I see a possible divide-by-zero; maybe you want