Mysql – Optimize query between two MySql tables

MySQLperformancequery-performance

I currently have a query that works on two tables of expense and income. This is an example of what the structure of both tables looks like since they have the same columns:

|       id|        date |   amount|
|---------|-------------|---------|
|        1|  2019-02-02 |     2500|
|        2|  2019-03-16 |    4000 |
|        3|  2019-04-02 |    5430 |

and this is the query I currently have:

SELECT
    t1.month,
    COALESCE(t2.amount, 0) AS expenses,
    COALESCE(t3.amount, 0) AS incomes
FROM
(
    SELECT 1 AS month UNION ALL
    SELECT 2 UNION ALL
    SELECT 3 UNION ALL
    SELECT 4 UNION ALL
    SELECT 5 UNION ALL
    SELECT 6 UNION ALL
    SELECT 7 UNION ALL
    SELECT 8 UNION ALL
    SELECT 9 UNION ALL
    SELECT 10 UNION ALL
    SELECT 11 UNION ALL
    SELECT 12
) t1
LEFT JOIN
(
    SELECT MONTH(date) AS month, SUM(amount) AS amount
    FROM expenses
    GROUP BY MONTH(date)
) t2
    ON t1.month = t2.month
LEFT JOIN
(
    SELECT MONTH(date) AS month, SUM(amount) AS amount
    FROM incomes
    GROUP BY MONTH(date)
) t3
    ON t1.month = t3.month
ORDER BY
    t1.month;

Here is the thread to see in detail and perform tests:http://sqlfiddle.com/#!9/466bd69/1

The query returns the total amount of both tables for each month of the current year, this works well. However, when reviewing the execution plan since I have many records, it takes too long to go through all the records of both tables, so how can I optimize it? Try adding an index to the date field without any improvement. Should I restructure my query?

Best Answer

Hi Max and welcome to DBA.SE.

"when reviewing the execution plan since I have many records, it takes too long to go through all the records of both tables, so how can I optimize it?"

Are you having a real performance issue or are you assuming you are going to have one? If your query needs the data from all rows, there is no magic trick that MySQL can do and not scan each and every one. If this is a real production challenge, we will need to see your execution plans for the real data. If it's not, I wouldn't worry about it too much, but there are several things you can do to help the optimizer and allow more efficient access methods.

Instead of the derived table for the months, I would create a permanent table for months with just 12 rows. probably the month number and its name and index both. This will make the query more readable, and might change the optimizer choice when the set is indexed. you will most likely use it in other places as well. I also like to create full calendar tables in my databases, see for example this script. It's a SQL Server T-SQL script, so you may need to adjust it slightly to work in MySQL.
The optimal index for your query is on (date, amount). The latter key is needed to cover the query so the optimizer can get all it needs directly from the index without performing lookups.
If you can upgrade to MySQL 8, it offers functional indexes where you can create an index on (MONTH(date), amount) directly, which will speed up this query, but may hurt modification performance like any other index.

HTH

Related Solutions

MySQL table design of logging tables

Your new design will make it easier to query across all of the tables without the need to know which tables to union when you write a query.

I'm not sure if you need to worry about the partitioning aspect. It sounds OK, but I've never worked with something like this on this scale (not in MySQL).

If you are worried about the partitioning, you could stay with your current solution, but instead of a new table for every year+month, have 12 tables, one for each month, for all years. At least your unions would be constant.

Mysql – Personal finance database design

If you look at incomes and expenses tables, you see they are the same, they differ only in a sign of the change of user balance. You can easily create only one table of "transactions" to keep both and either add a column with a type [Income, Expense] or decide it only by a sing on a value column (I suppose you want that one as you talk about summing all expenses and incomes, but it is not defined in your question).

Then you can do simple

select sum(value) as balance from transactions where user_id = X and datetime <= Y;

As you can see, I use datetime as one column - I suggest you do do that as otherwise all conditions for checking ranges get quite cumbersome:

where date < xxx or (date = xxx and time <= yyy)
-- and now add once more for the upper limit

This is good enough if you need that only sometimes and have a good index with user_id and datetime and preferably a value too. But if know that you will check that often or that there will be really lots of transactions for each user, it makes sense adding one more column where you will store the value of user balance after applying that transaction. That way getting the user balance for a given datetime is like

select balance_value 
from transactions 
where user_id = X and datetime <= Y
order by datetime desc
limit 1;

with again using index on (user_id, datetime) for a really fast retrieval.

This can have a secondary usage too - it lets you simply verify the users_balance records and check that there is no transaction missing. That can come heplful in a case of a bug or problem with data storage.

Best Answer

Related Solutions

MySQL table design of logging tables

Mysql – Personal finance database design

Related Question