Mysql – How to improve performance of a query that joins three tables

MySQLperformancePHP

I want to fetch data from three tables: table A, table B and table C.

Table A in databaseA has the customerid column which is a primary key
Tables B and C in databaseB also have same column customerid (not primary key).

I want to retrieve the last transaction date in table A and table B for each customer in table A.

Expected result should have these columns:

customerID
Max(date in table B)
Max(date in table C)

The query below takes forever to run. How do I optimize the query to get desired result in less than 5 sec?

NB: tableA and tableB have about 10 million data each and both databases have the same character set.

select
   a.customerid, 
   a.custname as name,   
   max(b.lastdate) as lastdt,
   max(c.lastdate) as lastdtc,

     case 
        when  max(b.lastdate) < date_sub(curdate(), interval 7 day) 
          and max(c.lastdate) > date_sub(curdate(), interval 7 day)
                  then 'INACTIVE'
        when  max(c.lastdate) < date_sub(curdate(), interval 7 day) 
         and  max(b.lastdate) < date_sub(curdate(), interval 7 day) 
                  then 'DORMANT'
        when  max(c.lastdate) is null 
         and  max(b.lastdate) < date_sub(curdate(), interval 7 day)
                  then 'DORMANT'
        else 
                       'ACTIVE'
        end as Status
  from 
           database1.table a
     inner join 
           database2.table b
           on a.customerid=b.customerid
        left join 
          database2.table c
           on a.customerid=c.customerid
           where a.customername like concat('$fromclient','%')
    group by a.customerid
         order by lastdt
    limit $fromclient offset $fromclient

Best Answer

Step 1: Build a query that computes lastdt, lastdtc, and status.

Step 2: Use that as a "derived" table to join to the other table(s) for the rest of the info.

The intent of Step 1 is to focus on the complexity of the GROUP BY and yield a much smaller table. Then Step 2 has less work to do.

Related Solutions

MySQL optimization – year column grouping – using temporary table, filesort

I don't see a lot of opportunity for improvement.

The index you added was probably a big help, because it's being used for the range matching on the WHERE clause (type => range, key => tran_date), and it's being used as a covering index (extra => using index), avoiding the need to seek into the table to fetch the row data.

But since you're using functions to construct the financial_year value for the group by, both the "using filesort" and "using temporary" can't be avoided. But, those aren't the real problem. The real problem is that you're evaluating MONTH(tran_date) 346,485 times and YEAR(tran_date) at least that many times... ~700,000 function calls in one second doesn't seem too bad.

Plan B: I am definitely not a fan of storing redundant data, and I'm dead-set against making the application responsible for maintaining it... but one option I might be tempted to try would be to create a dashboard_stats_by_financial_year table, and use after-insert/update/delete triggers on the transactions1 table to manage keeping those stats current.

That option has a cost, of course -- adding to the amount of time it takes to update/insert/delete a transaction... but, waiting > 1200 milliseconds for stats for your dashboard is a cost, too. So it may come down to whether you want to pay for it now or pay for it later.

Mysql – How to optimize a very slow query with joins and group by

OK, I ended up adding another lookup table:

CREATE TABLE IF NOT EXISTS `stops_routes` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `stop_id` varchar(100) NOT NULL,
  `route_id` varchar(100) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `stop_route` (`stop_id`,`route_id`),
  KEY `stop_id` (`stop_id`),
  KEY `route_id` (`route_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

Filling it was fairly fast:

mysql> insert into stops_routes (stop_id, route_id)
    ->
    -> select
    ->     s.stop_id,
    ->     r.route_id as from_route_id
    ->
    -> from routes r
    -> left join trips t on t.route_id = r.route_id
    -> left join stop_times st on st.trip_id = t.trip_id
    -> left join stops s on s.stop_id = st.stop_id
    -> group by s.stop_id, r.route_id;
Query OK, 3496 rows affected (8.38 sec)
Records: 3496  Duplicates: 0  Warnings: 0

Using it is blazingly fast:

mysql> select
    ->     r.route_id as from_route_id,
    ->     c_sr.route_id as to_route_id
    ->
    -> from routes r
    ->
    -> left join stops_routes sr on sr.route_id = r.route_id
    -> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
    -> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
    ->
    -> where r.route_id <> c_sr.route_id
    -> group by r.route_id, c_sr.route_id
    -> limit 10;
+---------------+-------------+
| from_route_id | to_route_id |
+---------------+-------------+
| 0001          | 0002        |
| 0001          | 0003        |
| 0001          | 0004        |
| 0001          | 0005        |
| 0001          | 0006        |
| 0001          | 0008        |
| 0001          | 0009        |
| 0001          | 0011        |
| 0001          | 0014        |
| 0001          | 0031        |
+---------------+-------------+
10 rows in set (0.63 sec)

Now I can fill my last lookup table (set of connections between every routes on my GTFS network):

mysql> insert into route_connections (from_route_id, to_route_id)
    -> select
    ->     r.route_id as from_route_id,
    ->     c_sr.route_id as to_route_id
    ->
    -> from routes r
    ->
    -> left join stops_routes sr on sr.route_id = r.route_id
    -> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
    -> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
    ->
    -> where r.route_id <> c_sr.route_id
    -> group by r.route_id, c_sr.route_id;
Query OK, 2848 rows affected (0.31 sec)
Records: 2848  Duplicates: 0  Warnings: 0

Amazingly fast. I guess the engine couldn't break up the steps to optimize this.
I'd still be interested to know if it would be possible to get the same result (from route to route connections table) using only one sub-second or sub-minute query.

Best Answer

Related Solutions

MySQL optimization – year column grouping – using temporary table, filesort

Mysql – How to optimize a very slow query with joins and group by

Related Question