Mysql – Why is the Group by and Having clause very slow with this query

group byMySQLoptimization

I have this query which is when run against 100K rows, takes forever to result in my system with 2 GB Ram and standard MySQL installation.

Now in my database , I have two tables, chitentry and chitentrygroup. Both contain more then 100K rows. When I run the following query the result is always pending and I do not get any output and the query keeps running.

    SELECT  SQL_CALC_FOUND_ROWS  chitentrygroupid, 
                       chitentrygroupdate, 
                       chitentryno, 
                       chitbookno, 
                       chitamount, 
                       entryamount, 
                       entryweight, 
                       employeename, 
                       accountname, 
                       rate, 
                       monthentry 
             FROM   (SELECT chitentrygroup.chitentrygroupid, 
           chitentrygroup.chitentrygroupdate, 
           chitentry.chitentryno, 
           chitentry.chitbookno, 
           Round(( chitentry.chitamount ), 0) AS chitamount, 
           chitentry.entryamount, 
           chitentry.entryweight, 
           employee.employeename, 
           account.accountname, 
           chitentry.rate, 
           chitentry.monthentry 
    FROM   chitentrygroup 
           LEFT OUTER JOIN chitentry 
                        ON chitentry.chitentrygroupid = 
                           chitentrygroup.chitentrygroupid 
           LEFT OUTER JOIN employee 
                        ON employee.employeeid = chitentrygroup.employeeid 
           LEFT OUTER JOIN account 
                        ON account.accountid = chitentrygroup.accountid 
         WHERE  chitentrygroup.status NOT IN ( 0, 3, 8 )  AND chitentrygroup.industryid IN ( 3 ) 
           AND 1 = 1 
           AND 1 = 1 
           AND 1 = 1 
    GROUP  BY 1 
    HAVING 1 = 1 
    ORDER  BY chitentrygroup.chitentrygroupid DESC ) AS t 
       LIMIT  0, 10

Now when I remove SQL_CALC_FOUND_ROWS , Group By and Having Part of the query. The query is blazing fast. The presence of any of the above I stated makes the query stuck with this big table.

How should I optimize this query with Group By,Having and SQL_CALC_FOUND_ROWS Clause?

Best Answer

Get rid out the outer layer:

SELECT  SQL_CALC_FOUND_ROWS  ...
    FROM (SELECT ...
            ORDER  BY chitentrygroup.chitentrygroupid DESC
         ) AS t 
   LIMIT  0, 10

-->

SELECT  SQL_CALC_FOUND_ROWS  ...
   ORDER  BY chitentrygroup.chitentrygroupid DESC
   LIMIT  0, 10

This will also fix a bug you may not have noticed. A derived table returns an unordered set of rows. That is, the optimizer is permitted to ignore the ORDER BY in the formulation you have. This gives you not the first 10 rows, but some arbitrary 10 rows.

FOUND_ROWS is passe. Have you noticed that search engines first moved to "about 1,000,000 hits", then got rid of the count? Do you really need the count?

What indexes do you have on chitentrygroup? In particular, this one may be beneficial:

INDEX(industryid, status)

Are you using InnoDB? What is the value of innodb_buffer_pool_size. For a tiny 2GB machine (or VM), it should be no more than 400M.

What version of MySQL are your running? Old versions defaulted that to 8M or 128M. These are too small.

What are the mappings? Is there one employee per chitentrygroupid? That is "1:many". What about account and chitentry? If these are each 1:many, you don't need to GROUP BY -- which is the real villain.

That is, if you can SELECT the chitentrygroupids, get through the ORDER BY and LIMIT without needing the ORDER BY, you avoid the "inflate-deflate" that happens with JOIN + GROUP BY.

Related Solutions

MySQL Performance – How to Optimize Slow Query with Joins and GROUP BY

OK, I ended up adding another lookup table:

CREATE TABLE IF NOT EXISTS `stops_routes` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `stop_id` varchar(100) NOT NULL,
  `route_id` varchar(100) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `stop_route` (`stop_id`,`route_id`),
  KEY `stop_id` (`stop_id`),
  KEY `route_id` (`route_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

Filling it was fairly fast:

mysql> insert into stops_routes (stop_id, route_id)
    ->
    -> select
    ->     s.stop_id,
    ->     r.route_id as from_route_id
    ->
    -> from routes r
    -> left join trips t on t.route_id = r.route_id
    -> left join stop_times st on st.trip_id = t.trip_id
    -> left join stops s on s.stop_id = st.stop_id
    -> group by s.stop_id, r.route_id;
Query OK, 3496 rows affected (8.38 sec)
Records: 3496  Duplicates: 0  Warnings: 0

Using it is blazingly fast:

mysql> select
    ->     r.route_id as from_route_id,
    ->     c_sr.route_id as to_route_id
    ->
    -> from routes r
    ->
    -> left join stops_routes sr on sr.route_id = r.route_id
    -> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
    -> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
    ->
    -> where r.route_id <> c_sr.route_id
    -> group by r.route_id, c_sr.route_id
    -> limit 10;
+---------------+-------------+
| from_route_id | to_route_id |
+---------------+-------------+
| 0001          | 0002        |
| 0001          | 0003        |
| 0001          | 0004        |
| 0001          | 0005        |
| 0001          | 0006        |
| 0001          | 0008        |
| 0001          | 0009        |
| 0001          | 0011        |
| 0001          | 0014        |
| 0001          | 0031        |
+---------------+-------------+
10 rows in set (0.63 sec)

Now I can fill my last lookup table (set of connections between every routes on my GTFS network):

mysql> insert into route_connections (from_route_id, to_route_id)
    -> select
    ->     r.route_id as from_route_id,
    ->     c_sr.route_id as to_route_id
    ->
    -> from routes r
    ->
    -> left join stops_routes sr on sr.route_id = r.route_id
    -> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
    -> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
    ->
    -> where r.route_id <> c_sr.route_id
    -> group by r.route_id, c_sr.route_id;
Query OK, 2848 rows affected (0.31 sec)
Records: 2848  Duplicates: 0  Warnings: 0

Amazingly fast. I guess the engine couldn't break up the steps to optimize this.
I'd still be interested to know if it would be possible to get the same result (from route to route connections table) using only one sub-second or sub-minute query.

Mysql Order By affecting a Query with column alias, group by, and having

The HAVING tipo = '1' should in no way allow output to include tipo=0, regardless of GROUP BY issues, etc.

File a bug report at http://bugs.mysql.com .

Best Answer

Related Solutions

MySQL Performance – How to Optimize Slow Query with Joins and GROUP BY

Mysql Order By affecting a Query with column alias, group by, and having

Related Question