PostgreSQL – Is It Safe to Insert Newest Row with SELECT INTO?

consistencypostgresql

Trying to keep rows consistent then referencing previous rows.

CREATE TABLE transactions (
    id         int     PRIMARY KEY
       GENERATED BY DEFAULT AS IDENTITY,

    account_id bigint  REFERENCES account,
    change     integer NOT NULL,
    balance    integer NOT NULL,
)

Query from the client is basically

SELECT balance
FROM transactions
WHERE account_id = X
ORDER BY id DESC
LIMIT 1;

INSERT INTO transactions (account_id, change, balance)
VALUES (X, 100, $balance + 100);

This code will produce incorrect results on concurrent inserts.
So, the question is how to make balance calculation always correct?

Will INSERT INTO ... SELECT(...) be enough?

Best Answer

It's not safe with this schema at all.

INSERT INTO ... SELECT(...)

Will execute a query in a single transaction but that transaction does not lock the table you're SELECTing from against future INSERTs. The problem here is that SELECT itself has ORDER BY id DESC LIMIT 1; There is no way under this schema to do that without table-locking. The database doesn't understand that insertion-dynamics behind id (which no database does). Because of this at best you can implicitly lock the single row you get with the SELECT, but there is no guarantee that that's the newest row by the time you get to INSERT.

So what are the solutions here,

Explicit table-level locking (don't do this).
Locking a shared resource for example SELECT FROM accounts WHERE account_id = x FOR UPDATE the FOR UPDATE acquires a row-level exclusive lock on the row. If this statement is in the INSERT INTO .. SELECT then future SELECT FOR UPDATEs on that row will block and wait. Alternatively it can be anywhere in the same transaction before the read from the transaction table. This is what I would do.
Predicate locking may also work in SERIALIZABLE mode, but only because there is an id = x on the SELECT. But, to be honest I'm not sure and I don't use that mode because I prefer less voodoo and by extension an aggregate on the table like sum(txns) will certainly not trigger the predicate lock, and then it just gets super confusing.

As a side note, when you do

INSERT INTO transactions (account_id, change, balance)
VALUES (X, 100, $balance + 100);

That is not safe. But if the balance was stored on the accounts table (which we have no reason to believe exists),

UPDATE accounts
SET balance = balance + 100
WHERE account_id = $1;

Would be safe.

Best Answer

Related Solutions

PostgreSQL – Difference Between INSERT … SELECT and SELECT INTO

PostgreSQL 9.5 – Query Performance Impact of JOINed Column in SELECT Clause

Related Question