PostgreSQL – How to Atomically Replace a Subset of Table Data

concurrencylockingpostgresqlpostgresql-9.6update

In PostgreSQL 9.6 I have a table T like this

category | id | data
---------+----+------
A        | 1  | foo
A        | 2  | bar
A        | 3  | baz
B        | 4  | eh
B        | 5  | whatcomesafterfoobarbaz

There is a view V giving me the data for T, so it has columns category, id, data. T is essentially the materialized view for V, except that I need to refresh it with more granularity than "refresh everything".

So I will select from V for example

SELECT * FROM V WHERE category = 'A';

SELECT * FROM V WHERE category = 'A' AND id = 2;

And replace the relevant rows in T with whatever data V gives me. Unfortunately I cannot do a simple UPDATE: asking V eg. for WHERE category = 'A' might give me a totally different set of rows than before. Therefore I need to do this sequence:

DELETE FROM T WHERE <condition>;
INSERT INTO T (SELECT FROM V WHERE <condition>);

<condition> is either WHERE category = ? or WHERE category = ? AND id = ?.

How do I do this so that the following conditions hold?

Reads from rows not satisfying <condition> should be unaffected.
The change should be atomic, meaning reads from rows satisfying <condition> should either see the the old row set or the new row set, not a mix.

Note: unlike this question, I don't want to replace the whole table at once – only the rows affected.

Added details

There are more reads than writes, on the order of 10-100 times more. After each write there will be a read to the adjacent categories. The application is looking at a set of categories, ids and data and updates the data for one or more categories at a time. Right after it will re-fetch those categories and display them, and it must see the fresh data. All the ids are always fetched with "their" category.
Each category will have something like 1-10 ids, there will be tens of thousands of categories.

More details after first answer

Transactions can run concurrently. There can definitely be a case when two transactions start with DELETE FROM T WHERE category = 'A';.
There is a table categories where it's possible to lock rows FOR UPDATE. There also is a table where ids can be locked FOR UPDATE.
RETURNING makes not much sense here as I need to fetch more than just the rows changed. Thus all is simpler with a separate SELECT.

Best Answer

Concurrent reads are not a problem. Writers don't block readers and vice versa in the default READ COMMITTED isolation level. Enclose DELETE and INSERT in a single transaction to make the operation atomic (all applied or nothing).

If there can be multiple transactions trying to write at the same time, that's a game changer. A single transaction protects you from inconsistent updates, but it cannot protect you from race conditions between concurrent transactions: deadlocks.

Say, we have two transactions T1 and T2, and category 'A' has 10 IDs:

T1: DELETE FROM T WHERE category = 'A';
-- starts taking row locks in arbitrary order: id 1,2,3,4,5,6,7 ...
                    T2: DELETE FROM T WHERE category = 'A';
                    -- starts taking row locks in arbitrary order: id 10, 9, 8, ...
T1: wait for T2 to release lock on id 8
                    T2: wait for T1 to release lock on id 7

DEADLOCK.

Postgres detects the deadlock after some time and kills one of the two transactions. (A deadlock error is reported.)

You could switch to SERIALIZABLE transaction isolation. But that's much more expensive and you need to prepare for serialization failures and retry in this case.

Or you can avoid the problem by always deleting rows in identical, deterministic order. Like:

WITH del AS (
   SELECT category, id
   FROM   T
   WHERE  category = 'A'
   ORDER  BY category, id  -- enforce this order in *all* writing queries
   FOR    UPDATE
   )
DELETE FROM T 
USING  del
WHERE  T.category = del.category
AND    T.id = del.id;

But typically, there is a more convenient option. If you have a separate table holding unique categories named, say, cat, you can lock the single parent row in cat with:

SELECT * FROM cat WHERE category = 'A' FOR UPDATE;

and then (in the same transaction) write to category 'A' rows in T at will (still encapsulated in a single transaction to avoid intermediary, inconsistent states being visible). Of course, all writing queries must follow the same protocol. Then, concurrent transactions will wait for the lock on cat before they write to T and everything is groovy ...

In Postgres 9.4 or later consider FOR NO KEY UPDATE instead:

How to perform conditional insert based on row count?

Concerning:

After each write there will be a read to the adjacent categories.

You are aware of the RETURNING clause, right? No need for a separate read, if you just inserted all rows for a given category. Example:

PostgreSQL multi INSERT…RETURNING with multiple columns

Test Table and Data

CREATE TABLE dbo.Table1
(
    PKcol integer PRIMARY KEY, 
    NonPKCol integer NULL UNIQUE, 
    col1 integer NULL
);

INSERT dbo.Table1
    (PKcol, NonPKCol, col1)
VALUES
    (1,1,0),
    (2,2,0),
    (3,3,0),
    (4,4,0),
    (5,5,0);

Uncommitted Update

On a separate connection, run:

BEGIN TRANSACTION;
UPDATE dbo.Table1 SET NonPKCol = 997 WHERE PKcol = 3;
UPDATE dbo.Table1 SET NonPKCol = 998 WHERE NonPKCol = 3;
UPDATE dbo.Table1 SET NonPKCol = 999 WHERE NonPKCol = 5;

Note the lack of a COMMIT or ROLLBACK TRANSACTION.

Test Results

-- (1) Succeeds (no conflicting locks encountered)
update table1 set col1 = col1 + 1 where PKcol < 3

-- (2) Waits for an X lock for clustered index key PKcol = 3
update table1 set col1 = col1 + 1 where PKcol = 3

-- (3) Waits on U lock for clustered index key PKcol = 3
update table1 set col1 = col1 + 1 where NonPKcol < 3

-- (3) Succeeds when read access is by NONCLUSTERED index
update t set col1 = col1 + 1 from table1 t with(index(2)) where NonPKcol < 3

-- (4) Blocks on U lock for NONCLUSTERED index key NonPKcol = 3
update table1 set col1 = col1 + 1 where NonPKcol = 3

-- (5) Blocks on U lock for nonclustered index key NonPKcol = 5
update table1 set col1 = col1 + 1 where PKcol < 3 and NonPKcol = 5

-- (5) Succeeds when access is by CLUSTERED index
update t set col1 = col1 + 1 from table1 t with(index(1)) where PKcol < 3 and NonPKcol = 5

Sql-server – How to update ID=null values in table to incremental counter values

You can do this.

WITH T
     AS (SELECT ISNULL((SELECT MAX(ID) FROM StagingTable), 0) + 
                    ROW_NUMBER() OVER (ORDER BY NaturalID) AS New_ID,
                ID
         FROM   StagingTable
         WHERE  ID IS NULL)
UPDATE T
SET    ID = New_ID

So the windowed function is used in the SELECT list but you can still use the result of it to UPDATE the column.

You should probably have a filtered index unique constraint on ID WHERE ID IS NOT NULL to prevent duplicates too. Or run this at serializable isolation level to block concurrent inserts.