PostgreSQL – Synchronizing Batch Delete with Advisory Lock

lockingpostgresql

We are moving our application from one-instance to multiple instances (all connected to the same Postgres database) in order to increase resiliency of one of our services, and I am now facing a synch issue between multiple instances of the app all trying to delete "old" data once a day (basically, "delete all rows since 30 days ago or more").

I am considering using an advisory lock in Postgres as a synchronisation token so that only one instance actually performs the deletion.

So this is how I envision it (assuming all instances run the exact same code): At, say, 4 AM each morning all 3 instances will try to acquire the advisory lock; only one will succeed initially, and it will perform the deletion of old data, while the other app instances will wait on the lock.

Once the first instance is done, the other instances will, in turn, acquire the lock but will find that there a no more eligible rows and return.

Please let me know if you see any issue with the above approach.

Best Answer

Manual locking is probably unnecessary. When the second process to get to a row in order to delete it discovers that it is already in the process of being deleted, it will inherently block until that deleting transaction commits. So this should already yield the behavior you want.

It is possible (but I think unlikely) that the processes will choose different execution plans which delete rows in different orders, and deadlock against each other. In that case, one will get served an ERROR, and the other will proceed. You will still get an acceptable outcome if you retry the ERRORed process, or if you are sure this is the only source of deadlocks you could just ignore deadlock errors and trust that the winner of the deadlock did the job. This is somewhat wasteful to have one process do a lot of work which just gets aborted and repeated, so you it might be worthwhile to use advisory locking to avoid that. But I wouldn't bother until I had a demonstrated problem that needs to be solved.

Depending on the execution plan for the deletions, verifying that no rows need to be deleted might itself be a lot of work. If you want to reduce that work, you could use "try" advisory locks to skip the work for all but the first session, but then if the first session fails before it finishes but after it has warned off the others, then the job just won't get done. And if your app uses advisory locks for other purposes, you would want to make sure you don't get collisions on the lock keyspace.