Mysql – Optimize query between two MySql tables

MySQLperformancequery-performance

I currently have a query that works on two tables of expense and income. This is an example of what the structure of both tables looks like since they have the same columns:

|       id|        date |   amount|
|---------|-------------|---------|
|        1|  2019-02-02 |     2500|
|        2|  2019-03-16 |    4000 |
|        3|  2019-04-02 |    5430 |

and this is the query I currently have:

SELECT
    t1.month,
    COALESCE(t2.amount, 0) AS expenses,
    COALESCE(t3.amount, 0) AS incomes
FROM
(
    SELECT 1 AS month UNION ALL
    SELECT 2 UNION ALL
    SELECT 3 UNION ALL
    SELECT 4 UNION ALL
    SELECT 5 UNION ALL
    SELECT 6 UNION ALL
    SELECT 7 UNION ALL
    SELECT 8 UNION ALL
    SELECT 9 UNION ALL
    SELECT 10 UNION ALL
    SELECT 11 UNION ALL
    SELECT 12
) t1
LEFT JOIN
(
    SELECT MONTH(date) AS month, SUM(amount) AS amount
    FROM expenses
    GROUP BY MONTH(date)
) t2
    ON t1.month = t2.month
LEFT JOIN
(
    SELECT MONTH(date) AS month, SUM(amount) AS amount
    FROM incomes
    GROUP BY MONTH(date)
) t3
    ON t1.month = t3.month
ORDER BY
    t1.month;

Here is the thread to see in detail and perform tests:http://sqlfiddle.com/#!9/466bd69/1

The query returns the total amount of both tables for each month of the current year, this works well. However, when reviewing the execution plan since I have many records, it takes too long to go through all the records of both tables, so how can I optimize it? Try adding an index to the date field without any improvement. Should I restructure my query?

Best Answer

Hi Max and welcome to DBA.SE.

"when reviewing the execution plan since I have many records, it takes too long to go through all the records of both tables, so how can I optimize it?"

Are you having a real performance issue or are you assuming you are going to have one? If your query needs the data from all rows, there is no magic trick that MySQL can do and not scan each and every one. If this is a real production challenge, we will need to see your execution plans for the real data. If it's not, I wouldn't worry about it too much, but there are several things you can do to help the optimizer and allow more efficient access methods.

  1. Instead of the derived table for the months, I would create a permanent table for months with just 12 rows. probably the month number and its name and index both. This will make the query more readable, and might change the optimizer choice when the set is indexed. you will most likely use it in other places as well. I also like to create full calendar tables in my databases, see for example this script. It's a SQL Server T-SQL script, so you may need to adjust it slightly to work in MySQL.
  2. The optimal index for your query is on (date, amount). The latter key is needed to cover the query so the optimizer can get all it needs directly from the index without performing lookups.
  3. If you can upgrade to MySQL 8, it offers functional indexes where you can create an index on (MONTH(date), amount) directly, which will speed up this query, but may hurt modification performance like any other index.

HTH