Postgresql – Improving accurate count mechanism in PostgreSQL

countpostgresqlpostgresql-performance

I have a problem I have been facing for some time and was ignoring for a while. The problem of counting rows. I have read that it is slow as the PostgreSQL has to validate each row thus have to do sequential scan. I am currently using PostgreSQL 13.

Potential solutions I came across:

  1. Using Estimate.

    Although the estimation was closer when querying a single table, the result was far off when using joins. Also I want exact count as I need to show the exact number to the user like how many transactions he did.

  2. Using an external table to maintain the count.

    The idea here was to use a rule or after insert trigger to update the count using count = count + 1 The problem here is the user can add multiple filters (like date, userid etc), which will change the count but the external table is only storing the count.

How would I solve this? I am trying to reduce the time to get the count to less than 3-4 seconds. Currently, there are almost 1 billion records in a table, although this table is split into 12 partitions for each month. Is this even possible?

Eleven partitions won't have inserts but could have delete operations. The user can apply filters which will change the count for any partition.

I am using a b-tree index on the date column. In explain analyze, the index is being used but since the records are huge, the b-tree index cannot fit in RAM as it's size is also big.

The tables are partitioned by each month and each partition has an index on the date, which means there is one index for each month.

Best Answer

With a large table, you won't be able to avoid I/O, and the query will never be fast. Very fast aggregates over huge amounts of data are not PostgreSQL's strong point. You want a column store for that (PostgreSQL has a column store extension).

If you VACUUM the table, PostgreSQL can avoid looking up the row by consulting the visibility map.

If 11 partitions are static, with only one active, it might be possible to store the total for the static partitions and add to that the count of rows for the active partition. If the user can apply filters, there is no way to get out of actually counting the rows that match. You might need a data warehouse of some sort that summarises the data.