Sql-server – Writing a simple bank schema: How should I keep the balances in sync with their transaction history

aggregatedatabase-designmaterialized-viewsql serversql-server-2008

I am writing the schema for a simple bank database. Here are the basic specifications:

  • The database will store transactions against a user and currency.
  • Every user has one balance per currency, so each balance is simply the sum of all transactions against a given user and currency.
  • A balance cannot be negative.

The bank application will communicate with its database exclusively through stored procedures.

I expect this database to accept hundreds of thousands of new transactions per day, as well as balance queries on a higher order of magnitude. To serve up balances very quickly I need to pre-aggregate them. At the same time, I need to guarantee that a balance never contradicts its transaction history.

My options are:

  1. Have a separate balances table and do one of the following:

    1. Apply transactions to both the transactions and balances tables. Use TRANSACTION logic in my stored procedure layer to ensure that balances and transactions are always in sync. (Supported by Jack.)

    2. Apply transactions to the transactions table and have a trigger that updates the balances table for me with the transaction amount.

    3. Apply transactions to the balances table and have a trigger that adds a new entry in the transactions table for me with the transaction amount.

    I have to rely on security-based approaches to make sure no changes can be made outside of the stored procedures. Otherwise, for example, some process could directly insert a transaction into the transactions table and under scheme 1.3 the relevant balance would be out of sync.

  2. Have a balances indexed view that aggregates the transactions appropriately. Balances are guaranteed by the storage engine to stay in sync with their transactions, so I don't need to rely on security-based approaches to guarantee this. On the other hand, I cannot enforce balances be non-negative anymore since views — even indexed views — cannot have CHECK constraints. (Supported by Denny.)

  3. Have just a transactions table but with an additional column to store the balance effective right after that transaction executed. Thus, the latest transaction record for a user and currency also contains their current balance. (Suggested below by Andrew; variant proposed by garik.)

When I first tackled this problem, I read these two discussions and decided on option 2. For reference, you can see a bare-bones implementation of it here.

  • Have you designed or managed a database like this with a high load profile? What was your solution to this problem?

  • Do you think I've made the right design choice? Is there anything I should keep in mind?

    For example, I know schema changes to the transactions table will require I rebuild the balances view. Even if I am archiving transactions to keep the database small (e.g. by moving them somewhere else and replacing them with summary transactions), having to rebuild the view off tens of millions of transactions with every schema update will probably mean significantly more downtime per deployment.

  • If the indexed view is the way to go, how can I guarantee that no balance is negative?


Archiving transactions:

Let me elaborate a bit on archiving transactions and the "summary transactions" I mentioned above. First, regular archiving will be a necessity in a high-load system like this. I want to maintain consistency between balances and their transaction histories while allowing old transactions to be moved somewhere else. To do this I'll replace every batch of archived transactions with a summary of their amounts per user and currency.

So, for example, this list of transactions:

user_id    currency_id      amount    is_summary
------------------------------------------------
      3              1       10.60             0
      3              1      -55.00             0
      3              1      -12.12             0

is archived away and replaced with this:

user_id    currency_id      amount    is_summary
------------------------------------------------
      3              1      -56.52             1

In this way, a balance with archived transactions maintains a complete and consistent transaction history.

Best Answer

I am not familiar with accounting, but I solved some similar problems in inventory-type environments. I store running totals in the same row with the transaction. I am using constraints, so that my data is never wrong even under high concurrency. I have written the following solution back then in 2009::

Calculating running totals is notoriously slow, whether you do it with a cursor or with a triangular join. It is very tempting to denormalize, to store running totals in a column, especially if you select it frequently. However, as usual when you denormalize, you need to guarantee the integrity of your denormalized data. Fortunately, you can guarantee the integrity of running totals with constraints – as long as all your constraints are trusted, all your running totals are correct. Also this way you can easily ensure that the current balance (running totals) is never negative - enforcing by other methods can also be very slow. The following script demonstrates the technique.

CREATE TABLE Data.Inventory(InventoryID INT NOT NULL IDENTITY,
  ItemID INT NOT NULL,
  ChangeDate DATETIME NOT NULL,
  ChangeQty INT NOT NULL,
  TotalQty INT NOT NULL,
  PreviousChangeDate DATETIME NULL,
  PreviousTotalQty INT NULL,
  CONSTRAINT PK_Inventory PRIMARY KEY(ItemID, ChangeDate),
  CONSTRAINT UNQ_Inventory UNIQUE(ItemID, ChangeDate, TotalQty),
  CONSTRAINT UNQ_Inventory_Previous_Columns 
     UNIQUE(ItemID, PreviousChangeDate, PreviousTotalQty),
  CONSTRAINT FK_Inventory_Self FOREIGN KEY(ItemID, PreviousChangeDate, PreviousTotalQty)
    REFERENCES Data.Inventory(ItemID, ChangeDate, TotalQty),
  CONSTRAINT CHK_Inventory_Valid_TotalQty CHECK(
         TotalQty >= 0 
     AND (TotalQty = COALESCE(PreviousTotalQty, 0) + ChangeQty)
  ),
  CONSTRAINT CHK_Inventory_Valid_Dates_Sequence CHECK(PreviousChangeDate < ChangeDate),
  CONSTRAINT CHK_Inventory_Valid_Previous_Columns CHECK(
        (PreviousChangeDate IS NULL AND PreviousTotalQty IS NULL)
     OR (PreviousChangeDate IS NOT NULL AND PreviousTotalQty IS NOT NULL)
  )
);

-- beginning of inventory for item 1
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
VALUES(1, '20090101', 10, 10, NULL, NULL);

-- cannot begin the inventory for the second time for the same item 1
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
VALUES(1, '20090102', 10, 10, NULL, NULL);


Msg 2627, Level 14, State 1, Line 10

Violation of UNIQUE KEY constraint 'UNQ_Inventory_Previous_Columns'. 
Cannot insert duplicate key in object 'Data.Inventory'.

The statement has been terminated.


-- add more
DECLARE @ChangeQty INT;
SET @ChangeQty = 5;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20090103', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

SET @ChangeQty = 3;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20090104', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

SET @ChangeQty = -4;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20090105', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

-- try to violate chronological order
SET @ChangeQty = 5;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20081231', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

Msg 547, Level 16, State 0, Line 4

The INSERT statement conflicted with the CHECK constraint 
"CHK_Inventory_Valid_Dates_Sequence". 
The conflict occurred in database "Test", table "Data.Inventory".

The statement has been terminated.

SELECT ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;

ChangeDate              ChangeQty   TotalQty    PreviousChangeDate      PreviousTotalQty
----------------------- ----------- ----------- ----------------------- -----
2009-01-01 00:00:00.000 10          10          NULL                    NULL
2009-01-03 00:00:00.000 5           15          2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3           18          2009-01-03 00:00:00.000 15
2009-01-05 00:00:00.000 -4          14          2009-01-04 00:00:00.000 18


-- try to change a single row, all updates must fail
UPDATE Data.Inventory SET ChangeQty = ChangeQty + 2 WHERE InventoryID = 3;
UPDATE Data.Inventory SET TotalQty = TotalQty + 2 WHERE InventoryID = 3;

-- try to delete not the last row, all deletes must fail
DELETE FROM Data.Inventory WHERE InventoryID = 1;
DELETE FROM Data.Inventory WHERE InventoryID = 3;

-- the right way to update
DECLARE @IncreaseQty INT;

SET @IncreaseQty = 2;

UPDATE Data.Inventory 
SET 
     ChangeQty = ChangeQty 
   + CASE 
        WHEN ItemID = 1 AND ChangeDate = '20090103' 
        THEN @IncreaseQty 
        ELSE 0 
     END,
  TotalQty = TotalQty + @IncreaseQty,
  PreviousTotalQty = PreviousTotalQty + 
     CASE 
        WHEN ItemID = 1 AND ChangeDate = '20090103' 
        THEN 0 
        ELSE @IncreaseQty 
     END
WHERE ItemID = 1 AND ChangeDate >= '20090103';

SELECT ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;

ChangeDate              ChangeQty   TotalQty    PreviousChangeDate      PreviousTotalQty
----------------------- ----------- ----------- ----------------------- ----------------
2009-01-01 00:00:00.000 10          10          NULL                    NULL
2009-01-03 00:00:00.000 7           17          2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3           20          2009-01-03 00:00:00.000 17
2009-01-05 00:00:00.000 -4          16          2009-01-04 00:00:00.000 20