How to Utilize Parallelism for PostgreSQL DELETE Operations


I have a fairly niche use case of a (AWS Aurora) PostgreSQL database and we are running into performance issues that I have discovered a "clever hack" (not actually clever) to work around and we are wondering if there is a better way to do things.

We have a write master that ideally gets hourly updates, the updates are managed by a single batch job, nothing else writes to the DB. Fundamentally the DB is a single large (4 billion rows) table – let's call this table Target. On the scheduled update time some data is loaded into staging tables and then used to update, add or delete rows from the main table.

My problem comes with the deletes and PostgreSQL's lack of parallelism.

Both the Target and Staging tables have a column structure of a,b,c,d,e. The Target table has a unique index on (a,b,c,d). Looking at just the combination of (a,b) there are approx 17 million unique keys. In the case of the Staging_Delete table, the data in it will be a strict subset of data in the Target table. We, unfortunately, get cases where we have deletes of approx 17 million rows where we touch on almost every single possible (a,b) in the table. We know that there is no magic that can make this quick but we have been surprised at just how slow slow is due to the lack of parallelism.

If I do a SELECT on the Target table joined against Staging_Delete to select only those rows that matchup with Staging_Delete then Postgres will happily spin up multiple worker threads to perform that join.

However, if I do a DELETE USING with exactly the same conditions then I don't get the benefits of parallelism.

Obviously, I understand why in the general case parallelism and DELETE/UPDATE would be bad times but in my specific case, it would be perfectly safe to parallel read the data into the shared buffer before performing single threaded updating of the data.

To give an idea of the performance difference:

  1. DELETE 1 million rows takes approx 30 minutes.
  2. SELECT 1 million rows takes 6 minutes.

This leads us to the "clever hack" – Nothing else is using the database apart from the one batch job. If we first run the SELECT and then run the DELETE we get all the data pages loaded into the shared buffer with the benefits of parallelism and then running the DELETE on the same rows takes 1 minute for an overall runtime of 7 minutes. Vastly faster than the raw DELETE USING by itself.

Once again we understand that in the general case this wouldn't work – another query might come along and knock out our cached data but in our case there is only 1 query running, we are in complete control.

However, this feels very unsatisfying – are there any Postgres settings we can play with that give us the benefits of the parallel data load without having to hack around the default behaviour?

Best Answer

Increasing "effective_io_concurrency" might work to get parallel IO without full parallelization. It is only going to be helpful if the join uses a bitmap scan, which we don't know if it does or not based on this info. We would need to see an EXPLAIN or an EXPLAIN (ANALYZE, BUFFERS) to know that.

But in general, I see nothing wrong with your current approach. It isn't elegant, but it probably doesn't need to be.

But it does seem like you are deleting a pretty major chunk of the table. 17 million rows randomly scattered from among 4 billion is probably going to mean something like 1/3 of the pages are going to get hit (depending on the size of the rows of course). If that many pages can stay cached between the select and the delete, then why don't they just stay cached all the time? Maybe you could use pg_prewarm to get the whole table cached. Or the reverse, maybe you could use partitioning to pre-organize the data that will get deleted together so it is stored together--concentrated into fewer table pages.