In PostgreSQL 9.6 I have a table T
like this
category | id | data
---------+----+------
A | 1 | foo
A | 2 | bar
A | 3 | baz
B | 4 | eh
B | 5 | whatcomesafterfoobarbaz
There is a view V
giving me the data for T
, so it has columns category, id, data
. T
is essentially the materialized view for V
, except that I need to refresh it with more granularity than "refresh everything".
So I will select from V
for example
SELECT * FROM V WHERE category = 'A';
Or
SELECT * FROM V WHERE category = 'A' AND id = 2;
And replace the relevant rows in T
with whatever data
V
gives me. Unfortunately I cannot do a simple UPDATE
: asking V
eg. for WHERE category = 'A'
might give me a totally different set of rows than before. Therefore I need to do this sequence:
DELETE FROM T WHERE <condition>;
INSERT INTO T (SELECT FROM V WHERE <condition>);
<condition>
is either WHERE category = ?
or WHERE category = ? AND id = ?
.
How do I do this so that the following conditions hold?
- Reads from rows not satisfying
<condition>
should be unaffected. - The change should be atomic, meaning reads from rows satisfying
<condition>
should either see the the old row set or the new row set, not a mix.
Note: unlike this question, I don't want to replace the whole table at once – only the rows affected.
Added details
-
There are more reads than writes, on the order of 10-100 times more. After each write there will be a read to the adjacent categories. The application is looking at a set of
categories
,ids
anddata
and updates thedata
for one or morecategories
at a time. Right after it will re-fetch thosecategories
and display them, and it must see the freshdata
. All theid
s are always fetched with "their"category
. -
Each
category
will have something like 1-10id
s, there will be tens of thousands ofcategories
.
More details after first answer
-
Transactions can run concurrently. There can definitely be a case when two transactions start with
DELETE FROM T WHERE category = 'A';
. -
There is a table
categories
where it's possible to lock rowsFOR UPDATE
. There also is a table whereid
s can be lockedFOR UPDATE
. -
RETURNING
makes not much sense here as I need to fetch more than just the rows changed. Thus all is simpler with a separateSELECT
.
Best Answer
Concurrent reads are not a problem. Writers don't block readers and vice versa in the default
READ COMMITTED
isolation level. EncloseDELETE
andINSERT
in a single transaction to make the operation atomic (all applied or nothing).If there can be multiple transactions trying to write at the same time, that's a game changer. A single transaction protects you from inconsistent updates, but it cannot protect you from race conditions between concurrent transactions: deadlocks.
Say, we have two transactions T1 and T2, and category 'A' has 10 IDs:
Postgres detects the deadlock after some time and kills one of the two transactions. (A deadlock error is reported.)
You could switch to
SERIALIZABLE
transaction isolation. But that's much more expensive and you need to prepare for serialization failures and retry in this case.Or you can avoid the problem by always deleting rows in identical, deterministic order. Like:
But typically, there is a more convenient option. If you have a separate table holding unique categories named, say,
cat
, you can lock the single parent row incat
with:and then (in the same transaction) write to category 'A' rows in
T
at will (still encapsulated in a single transaction to avoid intermediary, inconsistent states being visible). Of course, all writing queries must follow the same protocol. Then, concurrent transactions will wait for the lock oncat
before they write toT
and everything is groovy ...In Postgres 9.4 or later consider
FOR NO KEY UPDATE
instead:Concerning:
You are aware of the
RETURNING
clause, right? No need for a separate read, if you just inserted all rows for a given category. Example: