MySQL optimization – year column grouping – using temporary table, filesort

MySQLmysql-5mysql-5.5optimizationperformancequery-performance

I have a transactions table which is having 600,000 records, I need to list the count for the dashboard on financial year basis. The table used is MyISAM. I tried adding index for the transaction date (tran_date). Even though it is using the index it creates temporary table which is taking more time because of the temporary table and the filesort. Is there any way to optimize the query to improve the query time?

SELECT COUNT( * ) AS cnt, CASE WHEN MONTH( tran_date ) >=3
THEN concat( YEAR( tran_date ) , '-', YEAR( tran_date ) +1 )
ELSE concat( YEAR( tran_date ) -1, '-', YEAR( tran_date ) )
END AS financial_year
FROM `transactions1`
WHERE tran_date >= '2010-06-01'
GROUP BY financial_year

Showing rows 0 - 4 (5 total, Query took 1.2095 sec)
id select_type  table       type  possible_keys     key key_len    ref  rows    Extra
1   SIMPLE    transactions1 range PRIMARY,tran_date tran_date 8  NULL   346485  Using where; Using index; Using temporary; Using filesort
Keyname     Type    Unique  Packed  Field       Cardinality   Collation 
PRIMARY     BTREE   Yes       No    tran_date      205720         A     
                                    tran_ID        617162         A 
coupon_No   BTREE   No        No    coupon_No      617162         A     
account_typeBTREE   No        No    account_type   3              A     
prodCode    BTREE   No        No    prodCode       430            A     
                                    tran_date      308581         A 
tran_date   BTREE   No        No    tran_date      205720         A     
cust_ID     BTREE   No        No    cust_ID        3265           A     
                                    tran_date      308581         A 
                                    account_type   308581         A 
                                    points_earned  617162         A

Update :

Tried adding partition which is not that much helpful in comparison with non partitioned one. Does replication help in this case for reading this table?. There will be more grouping based on the dates (using the date functions) when reading the data.

Edit:

I altered the query and reduced the query execution time. The query I used is,

SELECT SUM( count )
FROM (
SELECT COUNT( * ) AS count,
CASE WHEN MONTH( tran_date ) >=3
THEN concat( YEAR( tran_date ) , '-', YEAR( tran_date ) +1 )
ELSE concat( YEAR( tran_date ) -1, '-', YEAR( tran_date ) )
END AS format_date
FROM transactions1
GROUP BY tran_date
) AS s
GROUP BY format_date

Showing rows 0 - 4 (5 total, Query took 0.5636 sec)
id  select_type     table     type  possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY     <derived2>    ALL       NULL        NULL      NULL      NULL    229676  Using temporary; Using filesort
2   DERIVED     transactions1 index     NULL        tran_date   8       NULL    617162  Using index

But when using

SELECT COUNT( * ) AS count,
CASE WHEN MONTH( tran_date ) >=3
THEN concat( YEAR( tran_date ) , '-', YEAR( tran_date ) +1 )
ELSE concat( YEAR( tran_date ) -1, '-', YEAR( tran_date ) )
END AS format_date
FROM transactions1
GROUP BY tran_date

Showing rows 0 - 29 (229,676 total, Query took 0.0006 sec)

gives less time without using the SUM(count) in the derived table. Is there any other way to get the sum without using the subquery in MySQL or can the subquery be optimized to get the index.

Best Answer

I don't see a lot of opportunity for improvement.

The index you added was probably a big help, because it's being used for the range matching on the WHERE clause (type => range, key => tran_date), and it's being used as a covering index (extra => using index), avoiding the need to seek into the table to fetch the row data.

But since you're using functions to construct the financial_year value for the group by, both the "using filesort" and "using temporary" can't be avoided. But, those aren't the real problem. The real problem is that you're evaluating MONTH(tran_date) 346,485 times and YEAR(tran_date) at least that many times... ~700,000 function calls in one second doesn't seem too bad.

Plan B: I am definitely not a fan of storing redundant data, and I'm dead-set against making the application responsible for maintaining it... but one option I might be tempted to try would be to create a dashboard_stats_by_financial_year table, and use after-insert/update/delete triggers on the transactions1 table to manage keeping those stats current.

That option has a cost, of course -- adding to the amount of time it takes to update/insert/delete a transaction... but, waiting > 1200 milliseconds for stats for your dashboard is a cost, too. So it may come down to whether you want to pay for it now or pay for it later.