MySQL – Optimizing Slow Subquery Joins with GROUP BY

group byjoin;MySQLsubquery

Can anyone help me with the following query?

SELECT
  voc.username AS Medewerker,
  voc.bestvastonbep_count AS Doorgezet,
  IFNULL(cc.call_count, 0) AS Telefoongesprekken,
  voc.bestvastonbep_count / IFNULL(cc.call_count, 0) AS Ratio
FROM (
  SELECT
    u.id AS user_id,
    CONCAT(u.firstname, ' ', u.lastname) AS username,
    COUNT(vo.id) AS bestvastonbep_count
  FROM `bestvastonbep`       vo
  JOIN `best`                b   ON b.id = vo.best_id
  JOIN `vastonbepbeststatus` vos ON b.id = vos.best_id
  JOIN `user`                u   ON u.id = b.user_id
  WHERE vos.type = 2
  AND   vos.status = 2
  AND  (b.date BETWEEN '2017-02-04' AND '2017-05-04')
  GROUP BY u.id
) voc
LEFT JOIN (
  SELECT
    user_id,
    COUNT(id) AS call_count
  FROM `call` c
  WHERE  (c.start BETWEEN '2017-02-04' AND '2017-05-04')
  AND c.duration > 0
  AND c.calltype = 'out'
  GROUP BY c.user_id
) cc
ON voc.user_id = cc.user_id;

It produces the desired result, but it's immensely slow.
I've read questions from other people about similar queries. For them it turned out that joining subqueries wasn't necessary.
The difference for me is that I'm using GROUP for both subqueries.
Hopefully there's a way and I'm just not seeing it.

Edit in response to RDFozz

The EXPLAIN query gives the following result.

The first subquery is very fast. (0.1 s)
The second subquery takes about 6 seconds.
The complete query takes about 57 seconds.

There is an index on call.user_id as well as call.start.
But the call table currently has about 32 million rows.

Answer
I took Plan A as described by Rick James.
The query now looks like the following:

SELECT
    voc.username AS Medewerker,
    voc.bestvastonbep_count AS Doorgezet,
    IFNULL(count(c.user_id), 0) AS Telefoongesprekken,
    IFNULL(voc.bestvastonbep_count / count(c.user_id), "&infin;") AS Ratio
FROM (
    SELECT u.id AS user_id, CONCAT(u.firstname, ' ', u.lastname) as username, COUNT(vo.id) AS bestvastonbep_count
    FROM `bestvastonbep`       vo
    JOIN `best`                b   ON b.id = vo.best_id
    JOIN `vastonbepbeststatus` vos ON b.id = vos.best_id
    JOIN `user`                u   ON u.id = b.user_id
    WHERE vos.type = 2
    AND   vos.status = 2
    AND  (b.date BETWEEN '2017-02-04' AND '2017-05-04')
    GROUP BY u.id
) voc 
LEFT JOIN `call` c on
    (c.start BETWEEN '2017-02-04' AND '2017-05-04')
    AND c.duration > 0
    AND c.calltype = 'out'
    AND voc.user_id = c.user_id
GROUP BY voc.user_id;

It takes about 7 seconds, depending on the size. Faster would be nice, but this is just about acceptable for my purposes.

Best Answer

Explode-Implode

In the first derived table (voc) you have what call the "explode-implode" syndrone. First the JOINs generate a lot of rows, then the GROUP BY whittles it back to a few rows.

Sometimes the solution is to turn a JOIN into a subquery:

SELECT  ...,
        b.foo
    FROM a
    LEFT JOIN b
    GROUP BY a.id

-->

SELECT  ...,
        ( SELECT foo FROM b WHERE ... = a... ) AS foo
    FROM a

Note that the GROUP BY goes away. This example works especially well for LEFT JOIN because the "optional" nature of foo is preserved.

(The GROUP BY in cc is not the same situation.)

Derived JOIN Derived

MySQL is terrible (pre-5.6) or poor (5.6+) at dealing with JOINing two subqueries (voc and cc).

Plan A: Turn one of those subqueries into a JOIN with the other.

Plan B: Put one of the subqueries into a TEMPORARY TABLE and give it an appropriate index: (user_id).

Indexes

In the absence of SHOW CREATE TABLE, must do some guessing...

vos: INDEX(type, status, best_id)
b:   INDEX(date)
c:   INDEX(calltype, start, duration, user_id)

And change COUNT(id) to COUNT(*).

Note: start BETWEEN '2017-02-04' AND '2017-05-04' is "inclusive". Consider changing to

    start >= '2017-02-04'
AND start  < '2017-02-04' + INTERVAL 3 MONTH

Please provide EXPLAIN SELECT ... after adding those indexes.

I see a possible divide-by-zero; maybe you want

IFNULL(voc.bestvastonbep_count / cc.call_count, 0)

Related Solutions

Mysql – Slow complex query with group/order

I can see couple things that should improve your query performance.

1 As you already found out there is absolutely no need to join mentioncache. Using EXISTS seems more natural (or IN as you did, but EXISTS may work better from performance point of view).

2 DATE(m.indexed) BETWEEN "2012-09-16" AND "2012-10-16" can be rewritten to m.indexed between "2012-09-16" AND "2012-10-16 23:59:59", so mysql can use index.

3 urlinfluranks doesn't seem to be used anywhere except in LEFT JOIN, why do you need it?

4 f.foreign_id can be either null or m.id, and this is the only reference to favoureditems table, I'd rather use subquery in this case.

Finally, I think you can get the same results without GROUP BY m.id (as far as I understood , mentions.id a primary key).

SELECT   
m.id, m.title, m.title_text, m.content_text, m.url,m.root_url,m.sub_type,m.indexed,  
CASE 
 WHEN EXISTS 
    (SELECT NULL FROM favoureditems f WHERE f.model = "Mention" 
    AND f.foreign_id = m.id AND f.owner_id = 803) THEN m.id 
END AS f.foreign_id,
, v.foreign_id, v.created, mfs.score,  
Image.id,Image.model,Image.foreign_key, Image.dirname,Image.basename,  
(REPLACE(REPLACE(m.host_url, 'http://www.', ''), 'http://', '')) AS Mention__plain_url  
FROM mentions AS m  

LEFT JOIN 
(
  SELECT id,model,foreign_key,dirname,basename 
  FROM attachments Image  
  WHERE model = 'Mention'
  GROUP BY foreign_key
 )Image  ON (Image.foreign_key = m.id)      

LEFT JOIN 
(
   SELECT v.foreign_id, v.created 
   FROM visiteditems AS v  
   WHERE (v.model = "Mention"  AND v.owner_id = 803)  
    GROUP BY v.foreign_id
)v ON (v.foreign_id = m.id)
LEFT JOIN 
(
   SELECT mention_id,score
   FROM mentionfeedscores mfs  
   WHERE mfs.feed_id = '474737584865424564398208323289092'
   GROUP BY mention_id
)mfs ON (mfs.mention_id = m.id )

WHERE m.indexed BETWEEN "2012-09-16" AND "2012-10-16 23:59:59"  
   AND EXISTS 
  (
     SELECT NULL FROM mentioncache mc  
      WHERE mc.mention_id = m.id AND mc.profile_id = 803  
   )    
ORDER BY m.indexed DESC  
LIMIT 10

Mysql – Subqueries run very fast individually, but when joined are very slow

You don't need all the derived tables. You are joining the basic (product) too many times. You can write the query joining it only once.

Compound indices are a must for EAV designs. Try adding an index on (attribute_id, product_id, value) and then the query:

SELECT t0.id, 
       t1.`value` AS length, 
       t2.`value` AS height, 
       t3.`value` AS family
FROM
  products t0

INNER JOIN 
  product_eav_decimal t1
    ON  t1.product_id = t0.id  
    AND t1.attribute_id = 91
    AND t1.`value` BETWEEN 15 AND 35

LEFT JOIN
  product_eav_decimal t2
    ON  t2.product_id = t0.id  
    AND t2.attribute_id = 80  
-- 
-- 
--

LEFT JOIN                              -- LEFT or INNER join
  product_eav_decimal t6
    ON  t6.product_id = t0.id  
 -- AND t6.attribute_id = 

ORDER BY t0.id ASC ;

Best Answer

Related Solutions

Mysql – Slow complex query with group/order

Mysql – Subqueries run very fast individually, but when joined are very slow

Related Question