Postgresql – Postgres index for aggregate functions

aggregateindexpostgresqlpostgresql-9.6

Query:

SELECT
  "places_place"."id"
FROM "places_place"
  LEFT OUTER JOIN "units_unit" ON ("places_place"."id" = "units_unit"."place_id")
GROUP BY "places_place"."id"
HAVING SUM("units_unit"."quantity") >= 123

Index attempt:

CREATE INDEX units_quantity_sum ON units_unit (SUM("units_unit"."quantity"));
-- ERROR:  aggregate functions are not allowed in index expressions

Essentially, I want it to index the result of SUM without storing the result in a separate column of the table. How can I create an index to do this (or is there a better way to optimize this query)?

EXPLAIN ANALYZE of query with 10,000 rows in places_place and 25,000 in units_unit:

HashAggregate  (cost=2057.31..2157.33 rows=10002 width=4) (actual time=38.121..41.174 rows=7727 loops=1)
  Group Key: places_place.id
  Filter: (sum(units_unit.quantity) >= 5)
  Rows Removed by Filter: 2275
  ->  Hash Right Join  (cost=594.04..1932.22 rows=25018 width=6) (actual time=6.383..28.578 rows=26727 loops=1)
        Hash Cond: (units_unit.place_id = places_place.id)
        ->  Seq Scan on units_unit  (cost=0.00..994.18 rows=25018 width=6) (actual time=0.003..7.279 rows=25018 loops=1)
        ->  Hash  (cost=469.02..469.02 rows=10002 width=4) (actual time=6.311..6.311 rows=10002 loops=1)
              Buckets: 16384  Batches: 1  Memory Usage: 480kB
              ->  Seq Scan on places_place  (cost=0.00..469.02 rows=10002 width=4) (actual time=0.007..3.560 rows=10002 loops=1)
Planning time: 0.584 ms
Execution time: 42.643 ms

Best Answer

You have two easy options

  1. You can use a MATERIALIZED VIEW
  2. You can also use a TRIGGER that inserts another table.

Both of these will allow you to cache the SUM(). I would go with the MATERIALIZED VIEW unless you need up to date changes all the time.

Before you go down that route though PostgreSQL 9.6 enables parallel seq scans and aggregation which will increase performance. In fact, it's an ideal use case. If you just need this to be faster try setting max_parallel_workers_per_gather