Mysql – How to write SQL queries to populate the tables while synchronizing transactions and account balance

MySQLoptimizationperformancequery-performance

I have two tables having the following schema.

CREATE TABLE users(
  `id` INT AUTO_INCREMENT,
  `balance` DECIMAL(10, 2),
  PRIMARY KEY(`id`)
);
CREATE TABLE user_transactions(
  `id` INT AUTO_INCREMENT,
  `user_id` INT,
  `previous_balance` DECIMAL(10,2) COMMENT 'Balance before transaction',
  `amount` DECIMAL(10,2) COMMENT 'It will be either positive or negative and added to the previous balance',
  `updated_balance` DECIMAL(10, 2) COMMENT 'Balance after transaction',
  `created` BIGINT,
  PRIMARY KEY(`id`)
);

I am using MySQL version 5.7. That's a minimal DB structure and indicates all the indexes going to be used. There will be more columns in both tables such as who carried out the transaction etc. I have to keep a record of the transaction in the user_transactions table and update the balance in the users table.

The transactions can be carried out by the user, the parent user, and the system. Therefore, I have to query the balance every few seconds and show it to the user. There will be no delete operations on both tables. Just insert and update on the users table, and only insert on the user_transactions table. Since there's going to be thousands of users and thousands of transactions every day, most probably at the same time, these insert and update queries should be optimized.

There are three approaches I could think of:

  1. Run three queries in a stored procedure as follows.

    CREATE PROCEDURE `update_balance` (user_id INT, amount DECIMAL(10,2))
    BEGIN
        DECLARE previous_balance DECIMAL(10,2);
        -- fetch previous balance
        SELECT balance INTO previous_balance FROM users WHERE id = user_id;
        -- create an entry in user_transactions table
        INSERT INTO user_transactions(user_id, previous_balance, amount, updated_balance, created) VALUES (user_id, previous_balance, amount, previous_balance+amount, UNIX_TIMESTAMP());
        -- updating the balance in users table
        UPDATE users SET balance = previous_balance+amount WHERE id = user_id;
    END
    
  2. Create a trigger after updating the balance in the users table.

  3. Don't use the balance column in the users table at all and fetch balance from the user_transactions table instead.

1st and 2nd approach will lock the row and prevent any update. It won't be feasible if there is another column in the users table being updated. 3rd approach appears not feasible to me as there's going to be a million records at some point.

This question points at writing schema. While I already have the schema, I need to write an optimized query(or queries) to populate the tables.

So, how to write queries to update balance while keeping every record of the transaction and the balance synced?

Best Answer

Most financial systems of record are very complex. They involve separate concepts of "posting" transactions, then "settling" transactions afterwards, as well as "closing" of financial periods, after which settled transactions can't be modified.

Say you walk into your bank and withdraw $100. The banker posts that transaction in front of you, but it isn't settled until later (often overnight). If the banker accidently typed in the wrong withdrawal amount, they can adjust the transaction and that corrected transaction will appear on your statement as a single -$100 withdrawal. If the error isn't noticed until after the transaction is settled, the correction is posted as a separate transaction and your statement will show the incorrect transaction + correction = net correct amount. Most banks close each day as part of settling standard bank accounts, making the two fairly similar. Credit cards generally close months (which is when they often compute fees, send statements, etc.)

One reason for the complexity is the concerns for concurrency and isolation on these methods. Consider the race condition on option 1:

  • two users try to execute transactions on the same user at the same time.
  • The two SELECT statements run in parallel to get balance (say, $500).
  • Then they both insert transactions (again in parallel) of -100.
  • Both will calculate the new balance of $500-$100=400. They'll block each other on the update, but the race condition will make the balance incorrect because they selected the starting balance without knowledge of the prior transaction.

So... That's my way of saying this is actually extremely complex if you care about being correct. Assuming accuracy matters, you might consider an alternate approach. Real life will be a bit more complicated to implement than this, but this is a high level approach : - post transactions as normal, but don't update the balance. - asynchronously, "close" a time period, at which time you process all transactions during that period to compute the new balance & cache that on the user, along with the time it was computed. - when you need a balance, look up the cached balance, and all transactions since it was computed. Return that balance to the user.