Specific issues with the query
PostgreSQL defaults to READ COMMITTED
isolation, and it looks like you're not using anything different. In READ COMMITTED
each statement gets its own snapshot. It cannot see uncommitted changes from other transactions; it's as if they simply haven't happened.
Now, imagine you run this simultaneously in three sessions, on a setup with three entries in jobs
, zero in jobs_running
and zero in jobs_completed
:
insert into piper.jobs_running
select x.fid from (
SELECT fid FROM piper.jobs
except
select fid from piper.jobs_running
except
select fid from piper.jobs_completed
) as x limit 1
returning(fid)
Each run will select the three jobs in jobs
. Because their snapshots were all taken before any of them committed a change, or even created the uncommitted row, *all of them will find zero rows in jobs_running
and jobs_completed
.
So they all claim a job. Probably the same job, because even though there's no ORDER BY
, the scan order will be the same.
Locking
Row locking crosses transnational boundaries, and lets you communicate between transactions to enforce ordering. So you might think this would solve your problem, but it won't.
If you take a FOR UPDATE
lock on the row
entry, so the row is locked exclusively, the lock is held until the transaction commits or rolls back. So you'd think that then the next transaction would get a different row, or if it tried to get the same row it'd wait for the lock to release, see that there was now an entry in jobs_running
, and skip the row. You'd be wrong.
What'll happen is that you'll lock all rows. One transaction will successfully get the locks on all rows. The other transactions, which will do the same index or sequential scan, will generally try to lock the rows in the same order, get stuck on the first lock attempt, and wait for the first transaction to roll back or commit. If you're unlucky, they might start locking different sets of rows and deadlock against each other instead, causing a deadlock abort, but usually you're just getting no useful concurrency.
Worse though, the first transaction picks one of the rows it locked, inserts a row into jobs_running
and commits, releasing the locks. Another transaction is then able to continue, and locks all the rows.... but it doesn't get a new snapshot of the database state (the snapshot gets taken at statement start), so *it can't see that you inserted a row into jobs_running
. Thus, it grabs the same job, inserts a row for that job into jobs_running
, and commits.
Condition re-checks
PostgreSQL has a quirky feature not included in most databases where, if a transaction blocks on a lock, it re-checks that the selected row still matches the lock condition when it gets the lock after the first transaction commits.
That's why the example in https://stackoverflow.com/questions/11532550/atomic-update-select-in-postgres works - it relies on qualifier re-checks in WHERE
clauses after reclaiming a lock.
The use of locking forces everything to run serially, so in practice you might as well have a single connection doing the work.
Isolation, ACID, and reality
Transaction isolation in PostgreSQL is not the perfect ideal where transactions run concurrently, but their effects are the same as if they were executed serially.
The only real world database that would deliver perfect isolation would be one that exclusively locks every table when a write transaction first accesses it, so in practice transactions can only be concurrent access if it's to different parts of the DB. Nobody wants to use such a database in situations where concurrency is desirable or useful.
All real world implementations are compromises.
READ COMMITTED
default
PostgreSQL's default is READ COMMITTED
isolation, a well-defined isolation level that permits non-repeatable and phantom reads, as discussed in the PostgreSQL manual on transaction isolation.
SERIALIZABLE
isolation
You can request stricter SERIALIZABLE
isolation on a per-transaction basis or as a per-user, per-database or (not recommended) global default. This provides much stronger guarantees, though still not perfect, at the cost of forcibly rolling back transactions if they interact in ways that couldn't happen if they had run serially.
Because your concurrent queries will always try to grab the first job, no matter how many jobs there are, all but one will abort with serialization failures. So in practice you don't get any useful concurrency and might as well have a single connection handing out jobs to workers.
(Note that prior to PostgreSQL 9.1 SERIALIZABLE
offered much weaker guarantees and would not detect quite a lot of cases where transactions were interdependent.)
Auto-re-run SERIALIZABLE
PostgreSQL does not auto-re-run SERIALIZABLE
transactions that abort due to serialization failures. There are cases where this would be quite useful, but other cases where doing so would be completely wrong and dangerous - especially where read/modify/write cycles via the application are involved. At this time there is no support for automatically rerunning transactions on serialization failures. The application is expected to retry.
Don't write a DIY queuing system
It looks like what you're trying to do is write a queuing system. Consider not doing that. Writing a robust, reliable and correct queueing system is hard, and there are a few good ones already available that you could adopt. You have to handle things like crash-safety, what happens when someone takes a task then fails to complete it, race conditions around completion just as you give up on the task handler completing it, etc. There are lots of subtle concurrency problems. Don't try to DIY.
9.5 and SKIP LOCKED
PostgreSQL 9.5, still in development, adds a feature that makes queuing quite a bit easier.
It lets you say that if when you SELECT ... FOR UPDATE
, if a row is locked, you should ignore it and carry on to find the next not-locked row. This is very useful when combined with a LIMIT
as it makes it possible to say "find me the first row someone else isn't already trying to claim". So it becomes very simple to write queuing that grabs jobs safely and concurrently.
Until that feature is available I strongly suggest that you stick to single connection queue management, or use a well-tested task queue system.
Best Answer
The serializable isolation says that the result of concurrent transactions has to be consistent with one serial order, no matter which one.
Here there are two possible serial (sequential) orders, T1 followed by T2 and T2 followed by T1.
The resut of T1 followed by T2 would be:
It happens to be exactly the result you got in the concurrent execution of T1 and T2.
This is why there is no reason for the engine to raise a serialization exception. This particular concurrent execution meets the condition of success of the serializable isolation level.
Here you are confused about how it works. The isolation level does not influence the order of execution. It merely detects serialization anomalies to raise an error when they happen (and also in fact, when they might happen, because false positives are possible).