Mysql – Why is the Group by and Having clause very slow with this query

group byMySQLoptimization

I have this query which is when run against 100K rows, takes forever to result in my system with 2 GB Ram and standard MySQL installation.

Now in my database , I have two tables, chitentry and chitentrygroup. Both contain more then 100K rows. When I run the following query the result is always pending and I do not get any output and the query keeps running.

    SELECT  SQL_CALC_FOUND_ROWS  chitentrygroupid, 
                       chitentrygroupdate, 
                       chitentryno, 
                       chitbookno, 
                       chitamount, 
                       entryamount, 
                       entryweight, 
                       employeename, 
                       accountname, 
                       rate, 
                       monthentry 
             FROM   (SELECT chitentrygroup.chitentrygroupid, 
           chitentrygroup.chitentrygroupdate, 
           chitentry.chitentryno, 
           chitentry.chitbookno, 
           Round(( chitentry.chitamount ), 0) AS chitamount, 
           chitentry.entryamount, 
           chitentry.entryweight, 
           employee.employeename, 
           account.accountname, 
           chitentry.rate, 
           chitentry.monthentry 
    FROM   chitentrygroup 
           LEFT OUTER JOIN chitentry 
                        ON chitentry.chitentrygroupid = 
                           chitentrygroup.chitentrygroupid 
           LEFT OUTER JOIN employee 
                        ON employee.employeeid = chitentrygroup.employeeid 
           LEFT OUTER JOIN account 
                        ON account.accountid = chitentrygroup.accountid 
         WHERE  chitentrygroup.status NOT IN ( 0, 3, 8 )  AND chitentrygroup.industryid IN ( 3 ) 
           AND 1 = 1 
           AND 1 = 1 
           AND 1 = 1 
    GROUP  BY 1 
    HAVING 1 = 1 
    ORDER  BY chitentrygroup.chitentrygroupid DESC ) AS t 
       LIMIT  0, 10 

Now when I remove SQL_CALC_FOUND_ROWS , Group By and Having Part of the query. The query is blazing fast. The presence of any of the above I stated makes the query stuck with this big table.

How should I optimize this query with Group By,Having and SQL_CALC_FOUND_ROWS Clause?

Best Answer

Get rid out the outer layer:

SELECT  SQL_CALC_FOUND_ROWS  ...
    FROM (SELECT ...
            ORDER  BY chitentrygroup.chitentrygroupid DESC
         ) AS t 
   LIMIT  0, 10

-->

SELECT  SQL_CALC_FOUND_ROWS  ...
   ORDER  BY chitentrygroup.chitentrygroupid DESC
   LIMIT  0, 10

This will also fix a bug you may not have noticed. A derived table returns an unordered set of rows. That is, the optimizer is permitted to ignore the ORDER BY in the formulation you have. This gives you not the first 10 rows, but some arbitrary 10 rows.

FOUND_ROWS is passe. Have you noticed that search engines first moved to "about 1,000,000 hits", then got rid of the count? Do you really need the count?

What indexes do you have on chitentrygroup? In particular, this one may be beneficial:

INDEX(industryid, status)

Are you using InnoDB? What is the value of innodb_buffer_pool_size. For a tiny 2GB machine (or VM), it should be no more than 400M.

What version of MySQL are your running? Old versions defaulted that to 8M or 128M. These are too small.

What are the mappings? Is there one employee per chitentrygroupid? That is "1:many". What about account and chitentry? If these are each 1:many, you don't need to GROUP BY -- which is the real villain.

That is, if you can SELECT the chitentrygroupids, get through the ORDER BY and LIMIT without needing the ORDER BY, you avoid the "inflate-deflate" that happens with JOIN + GROUP BY.