My question: what is the very latest status on whether or not recent versions of Oracle actually make it possible to implement a queue in a straightforward way with SKIP LOCKED and limiting to a single row, without resorting to indirect solutions/fragile solutions?
There seems to be alot of history going back a long time showing SKIP LOCKED queueing being problematic with Oracle – I'm trying to determine if newest Oracle's have cleared these problems up.
I have implemented queue style functionality using Postgres with SKIP LOCKED, and I now wish to do the same with Oracle. I'm happy to use any Oracle version that makes it possible. So before I head down the path of trying to implement this for Oracle, I wanted to first ask if it is impossible to do so.
I've been reading alot of documentation on the web to try to determine if it can be done … older information seems to indicate that Oracle is not able to truly limit results returned to only one single row, which is a big problem when using "SKIP LOCKED" in a queue because for queue processing you want only one row.
Previous information indicating row limit in Oracle depends on fragile/indirect solutions:
https://stackoverflow.com/questions/16299663/select-for-update-skip-locked-with-row-limit
https://stackoverflow.com/questions/6117254/force-oracle-to-return-top-n-rows-with-skip-locked
https://stackoverflow.com/questions/50390146/query-limit-in-oracle-database
There appears to be a recent FETCH statement implemented in recent Oracle versions, but again it is not clear if this truly restricts access to a single row. Does FETCH make it possible to implement a queue with SKIP LOCKED in a direct and robust manner?
Please note I am aware Oracle has Advanced Queue functionality built in – I do not want to use that.
Here's what I wrote for Postgres – it's pretty straightforward – note I am aware it lacks needed transaction handling:
import psycopg2
import psycopg2.extras
import random
db_params = {
'database': 'jobs',
'user': 'jobsuser',
'password': 'superSecret',
'host': '127.0.0.1',
'port': '5432',
}
conn = psycopg2.connect(**db_params)
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
def do_some_work(job_data):
if random.choice([True, False]):
print('do_some_work FAILED')
raise Exception
else:
print('do_some_work SUCCESS')
def process_job():
sql = """DELETE FROM message_queue
WHERE id = (
SELECT id
FROM message_queue
WHERE status = 'new'
ORDER BY created ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING *;
"""
cur.execute(sql)
queue_item = cur.fetchone()
print('message_queue says to process job id: ', queue_item['target_id'])
sql = """SELECT * FROM jobs WHERE id =%s AND status='new_waiting' AND attempts <= 3 FOR UPDATE;"""
cur.execute(sql, (queue_item['target_id'],))
job_data = cur.fetchone()
if job_data:
try:
do_some_work(job_data)
sql = """UPDATE jobs SET status = 'complete' WHERE id =%s;"""
cur.execute(sql, (queue_item['target_id'],))
except Exception as e:
sql = """UPDATE jobs SET status = 'failed', attempts = attempts + 1 WHERE id =%s;"""
# if we want the job to run again, insert a new item to the message queue with this job id
cur.execute(sql, (queue_item['target_id'],))
else:
print('no job found, did not get job id: ', queue_item['target_id'])
conn.commit()
process_job()
cur.close()
conn.close()
Best Answer
The problem is that you rely on non-deterministic behaviour (which, by being non-deterministic, is bound to change at any time).
SQL is a declarative language. It does not define how you want to achieve something; it only defines what you want to achieve. Let's see what you want to achieve by rephrasing your SQL statement:
status = 'new'
created
.Given a lucky combination of indexes, table statistics, and possibly other things at this time a particular SQL engine might find a plan that only locks one row before it gets the one eligible to be returned. At some other time it might find it more advantageous (or possible) to lock more rows, or the entire table, without violating semantics of the query. Then you will suddenly discover that your application works somewhat differently from what you have come to expect. The same applies to moving your application to a different SQL engine.
In short, SQL is not the right tool for the job where you want to prescribe how you want it done. Also, your fixation on
skip locked
is misplaced.