Mysql – Slow complex query with group/order

group byMySQLoptimizationorder-byperformancequery-performance

I have a problem with a very important query in my application.

It is very slow when adding group and order to it.

So the full query is:

SELECT 
    m.id, 
    m.title, 
    m.title_text, 
    m.content_text, 
    m.url, 
    m.root_url, 
    m.sub_type, 
    m.indexed, 
    f.foreign_id, 
    v.foreign_id, 
    v.created, 
    mfs.score, 
    Image.id, 
    Image.model, 
    Image.foreign_key, 
    Image.dirname, 
    Image.basename, 
    (REPLACE(REPLACE(m.host_url, 'http://www.', ''), 'http://', '')) AS Mention__plain_url 
FROM mentions AS m 
    JOIN mentioncache AS mc 
        ON (mc.mention_id = m.id AND mc.profile_id = 803) 
    LEFT JOIN attachments AS Image 
        ON (Image.foreign_key = m.id AND Image.model = 'Mention') 
    LEFT JOIN urlinfluranks AS u 
        ON (u.url = m.host_url) 
    LEFT JOIN favoureditems AS f 
        ON (f.model = "Mention" AND f.foreign_id = m.id AND f.owner_id = 803) 
    LEFT JOIN visiteditems AS v 
        ON (v.model = "Mention" AND v.foreign_id = m.id AND v.owner_id = 803) 
    LEFT JOIN mentionfeedscores AS mfs 
        ON (mfs.mention_id = m.id AND mfs.feed_id = '474737584865424564398208323289092') 
WHERE DATE(m.indexed) BETWEEN "2012-09-16" AND "2012-10-16" 
GROUP BY m.id
ORDER BY m.indexed DESC 
LIMIT 10

With the group and the order the query takes about 2 – 3 seconds, without them just 0.002 seconds.

The explain of this query with group/order is:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  Mentioncache    index   mention_id_profile_id,mention_id    mention_id_profile_id   8   NULL    32946   Using where; Using index; Using temporary; Using filesort
1   SIMPLE  Mention eq_ref  PRIMARY PRIMARY 4   clippingcroc_dev.Mentioncache.mention_id    1   Using where
1   SIMPLE  Image   ref foreign_key,model_foreign_key   model_foreign_key   66  const,clippingcroc_dev.Mention.id   1    
1   SIMPLE  Favoureditem    ref model_foreign_id_owner_id   model_foreign_id_owner_id   163 const,clippingcroc_dev.Mentioncache.mention_id,const    2   Using index
1   SIMPLE  Visiteditem ref model_foreign_id_owner_id   model_foreign_id_owner_id   163 const,clippingcroc_dev.Mention.id,const 2    
1   SIMPLE  Mentionfeedscore    ref mention_id,feed_id  mention_id  4   clippingcroc_dev.Mentioncache.mention_id    2    
1   SIMPLE  Urlinflurank    eq_ref  url url 765 clippingcroc_dev.Mention.host_url   1   Using index

Without group/order the explain is:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  Mentioncache    index   mention_id_profile_id,mention_id    mention_id_profile_id   8   NULL    32946   Using where; Using index
1   SIMPLE  Mention eq_ref  PRIMARY PRIMARY 4   clippingcroc_dev.Mentioncache.mention_id    1   Using where
1   SIMPLE  Image   ref foreign_key,model_foreign_key   model_foreign_key   66  const,clippingcroc_dev.Mention.id   1    
1   SIMPLE  Favoureditem    ref model_foreign_id_owner_id   model_foreign_id_owner_id   163 const,clippingcroc_dev.Mentioncache.mention_id,const    2   Using index
1   SIMPLE  Visiteditem ref model_foreign_id_owner_id   model_foreign_id_owner_id   163 const,clippingcroc_dev.Mention.id,const 2    
1   SIMPLE  Mentionfeedscore    ref mention_id,feed_id  mention_id  4   clippingcroc_dev.Mentioncache.mention_id    2    
1   SIMPLE  Urlinflurank    eq_ref  url url 765 clippingcroc_dev.Mention.host_url   1   Using index

So it seems the reason for the performance-difference is the “Using temporary; Using filesort” in the mentioncache-table.

I tried to set different indexes to the tables to get MySQL to use the existing indexes for group/order. But I don’t get it.

Please help me to get a better performance of this query when using group/order.

Tell me if you need more information about the structure/indexes of the tables.

Edit

It seems the reason for my slowly query is that MySQL can’t use the indexes for group and order.

So I tried to get MySQL to use the indexes in the mentions table.

Unfortunately the “first” table the MySQL optimizer uses is the mentioncache table which doesn’t have these indexes.

So is there a possibility to change the “order” of the tables the optimizer works with?

As an alternative I moved the JOIN to the mentioncache table to a subquery like this:

SELECT SQL_NO_CACHE 
    m.id, 
    m.title, 
    m.title_text, 
    m.content_text, 
    m.url, 
    m.root_url, 
    m.sub_type, 
    m.indexed
FROM mentions AS m 
WHERE 
    m.id IN (SELECT mention_id FROM mentioncache WHERE profile_id = 803) 
    AND DATE(m.indexed) BETWEEN "2012-09-16" AND "2012-10-16" 
GROUP BY m.id 
ORDER BY m.indexed DESC 
LIMIT 10, 10

This query runs in about 0.02 seconds and is much faster than the first one and it uses the right indexes (there is no using temporary/using filesort in the explain).

Unfortunately this query slows down rapidly when another join table is added:

SELECT SQL_NO_CACHE 
    m.id, 
    m.title, 
    m.title_text, 
    m.content_text, 
    m.url, 
    m.root_url, 
    m.sub_type, 
    m.indexed
FROM 
    mentions AS m 
    LEFT JOIN mentionfeedscores AS mfs 
        ON (mfs.mention_id = m.id AND mfs.feed_id = '474737584865424564398208323289092') 
WHERE 
    m.id IN (SELECT mention_id FROM mentioncache WHERE profile_id = 803) 
    AND DATE(m.indexed) BETWEEN "2012-09-16" AND "2012-10-16" 
GROUP BY m.id 
ORDER BY m.indexed DESC 
LIMIT 10, 10

In this case the query takes again nearly 2 seconds to return the results.

So I don’t know what to do to get a better performance of my query.

Best Answer

I can see couple things that should improve your query performance.

1 As you already found out there is absolutely no need to join mentioncache. Using EXISTS seems more natural (or IN as you did, but EXISTS may work better from performance point of view).

2 DATE(m.indexed) BETWEEN "2012-09-16" AND "2012-10-16" can be rewritten to m.indexed between "2012-09-16" AND "2012-10-16 23:59:59", so mysql can use index.

3 urlinfluranks doesn't seem to be used anywhere except in LEFT JOIN, why do you need it?

4 f.foreign_id can be either null or m.id, and this is the only reference to favoureditems table, I'd rather use subquery in this case.

Finally, I think you can get the same results without GROUP BY m.id (as far as I understood , mentions.id a primary key).

SELECT   
m.id, m.title, m.title_text, m.content_text, m.url,m.root_url,m.sub_type,m.indexed,  
CASE 
 WHEN EXISTS 
    (SELECT NULL FROM favoureditems f WHERE f.model = "Mention" 
    AND f.foreign_id = m.id AND f.owner_id = 803) THEN m.id 
END AS f.foreign_id,
, v.foreign_id, v.created, mfs.score,  
Image.id,Image.model,Image.foreign_key, Image.dirname,Image.basename,  
(REPLACE(REPLACE(m.host_url, 'http://www.', ''), 'http://', '')) AS Mention__plain_url  
FROM mentions AS m  

LEFT JOIN 
(
  SELECT id,model,foreign_key,dirname,basename 
  FROM attachments Image  
  WHERE model = 'Mention'
  GROUP BY foreign_key
 )Image  ON (Image.foreign_key = m.id)      

LEFT JOIN 
(
   SELECT v.foreign_id, v.created 
   FROM visiteditems AS v  
   WHERE (v.model = "Mention"  AND v.owner_id = 803)  
    GROUP BY v.foreign_id
)v ON (v.foreign_id = m.id)
LEFT JOIN 
(
   SELECT mention_id,score
   FROM mentionfeedscores mfs  
   WHERE mfs.feed_id = '474737584865424564398208323289092'
   GROUP BY mention_id
)mfs ON (mfs.mention_id = m.id )

WHERE m.indexed BETWEEN "2012-09-16" AND "2012-10-16 23:59:59"  
   AND EXISTS 
  (
     SELECT NULL FROM mentioncache mc  
      WHERE mc.mention_id = m.id AND mc.profile_id = 803  
   )    
ORDER BY m.indexed DESC  
LIMIT 10