PostgreSQL partitions consuming huge amounts of memory

partitioningpostgresqlpostgresql-13

I have a decent sized (~10M row) table that recently has started slowing down quite a bit. As a solution to this, we decided to partition this table into 1,000 tables, partitioned by a hash of the client ID. That way, we usually only have a few clients bucketed per table. We created this partition in production along side the existing table and populated all the data and keep them both in sync with a trigger function. Manually running queries against the partition table showed huge promise with most seeing 10-100x+ speedups. Thinking all was working great, we swapped the two table names on production and it immediately crashed our DB.

  1. Freeable memory instantly dropped from 6 GB to 0
  2. Swap usage jumped from 0 to 3 GB
  3. The DB slowed to such a crawl that we had to completely shut down the web app so there were no connections to the DB and revert the table name swap.

I've gone through the queries that use this table and all of them have an explicit WHERE client_id = <client_id> so they should be using just their partition and not the parent table. None of them select across multiple partitions.

What could possibly be using that much memory? Do Postgres partitions have that much memory overhead? From what I've read, Postgres should be able to handle tens of thousands of partitions, so my 1,000 shouldn't be an issue.

This is all on an RDS M5.xlarge instance with 4 vCPU and 16 GB RAM.

Update:
We upgraded the instance size to one with 32 GB of RAM and scaled down the number of partitions from 1,000 to only 250 and tried again. This time we saw an immediate drop in memory, but only about 3-4 GB before leveling off, which indicates that the memory scales with the number of partitions. Again, we were already doing plenty of INSERT/UPDATE/DELETE commands against the partition through triggers before swapping the table name and saw no noticeable impact. It was only when the table names were swapped so that SELECTs started hitting it that we saw the huge drop in memory. This doesn't seem to be consistent with the documentation that states that the memory hit is particularly bad for UPDATE and DELETE commands.

Best Answer

Yes, more partitions mean more overhead in memory.

Refer below text taken from docs:

It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions. Planning times become longer and memory consumption becomes higher as more partitions are added. This is particularly true for the UPDATE and DELETE commands. Another reason to be concerned about having a large number of partitions is that the server's memory consumption may grow significantly over a period of time, especially if many sessions touch large numbers of partitions. That's because each partition requires its metadata to be loaded into the local memory of each session that touches it.

With data warehouse type workloads, it can make sense to use a larger number of partitions than with an OLTP type workload. Generally, in data warehouses, query planning time is less of a concern as the majority of processing time is spent during query execution. With either of these two types of workload, it is important to make the right decisions early, as re-partitioning large quantities of data can be painfully slow. Simulations of the intended workload are often beneficial for optimizing the partitioning strategy. Never assume that more partitions are better than fewer partitions and vice-versa.

Reference: 5.11. Table Partitioning - 5.11.6. Best Practices for Declarative Partitioning (PostgreSQL 14 | Documentation)