Postgresql – Performance improvement needed for query with large IN

postgresqlpostgresql-performance

I have a relatively simple table:

CREATE TABLE t_balances (
  f_index   BIGINT NOT NULL
 ,f_epoch             BIGINT NOT NULL
 ,f_balance           BIGINT NOT NULL
);
CREATE UNIQUE INDEX i_balances_1 ON t_balances(f_index, f_epoch);

This has over 100,000 individual f_index values, with each having tens of thousands of entries on f_epoch. (f_index,f_epoch) is unique (as per the index).

I am attempting to obtain balances for a subset of the indices over a given epoch range, for example "indices 1, 2 and 3 for epochs 100 to 200", however am struggling with finding a suitable index to provide acceptable performance when requesting data for a significant subset of the indices. For example, if I attempt to fetch over a small epoch range for 15,000 indices (out of a total of ~110,000) with the query:

EXPLAIN ANALYZE VERBOSE SELECT *
FROM t_balances
WHERE f_epoch >= 12041
  AND f_epoch < 12265
  AND f_index IN(1,2,3,...15000)
ORDER BY f_index, f_epoch;

...

 Index Scan using i_balances_1 on public.t_balances  (cost=0.58..3317610.88 rows=3414744 width=32) (actual time=1.287..2932.709 rows=3342304 loops=1)
   Output: f_index, f_epoch, f_balance
   Index Cond: ((t_balances.f_index = ANY ('{0,1,2,3...15000}'::bigint[])) AND (t_balances.f_epoch >= 12041) AND (t_balances.f_epoch < 12265))
 Planning Time: 20.451 ms
 JIT:
   Functions: 2
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.210 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 0.210 ms
 Execution Time: 3020.610 ms

I need to speed this up at least one order of magnitude, but am struggling to find something that manages this whilst remaining usable in day-to-day operations. I have full control of the schema, and can change the format of the table if required. The would ideally work with PostgreSQL 12, although if 13 is required then so be it.

The server on which the query is running is pretty fast (Ryzen 9 3900, 128GB memory, NVMe SSDs) so I don't think I'm in a situation where the query is as fast as it can be and constrained by the hardware.

For reference, I include some of my attempts to date:

Indexing

I tried various indices, to no effect. May be that there is some other index type that would work better here, but from my reading of the index literature it doesn't appear so.

Adding an index on t_balances(f_epoch) is ignored by the above query.

Creating a view

I tried creating a view. The list of indices required for queries relatively static, but mutable, so I could not hard code the values in to the index definition. I ended up with a simple lookup table

CREATE TABLE t_groups (
  f_group BIGINT NOT NULL
 ,f_index BIGINT NOT NULL
);

and created the view with

CREATE VIEW v_balances_group1
  AS
    SELECT *
    FROM t_balances
    WHERE f_index IN (SELECT f_index FROM t_groups WHERE f_group = 1) ORDER BY f_index;

This did not result in any speedup:

EXPLAIN ANALYZE VERBOSE SELECT *
FROM v_balances_group1
WHERE f_epoch >= 12041
  AND f_epoch < 12265
ORDER BY f_index, f_epoch;
                                                                     ...

 Sort  (cost=1500782.79..1509107.56 rows=3329907 width=32) (actual time=2778.815..3055.376 rows=3360224 loops=1)
   Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
   Sort Key: t_balances.f_index, t_balances.f_epoch
   Sort Method: external merge  Disk: 138128kB
   ->  Gather Merge  (cost=604302.14..1013405.00 rows=3329907 width=32) (actual time=1196.396..1601.870 rows=3360224 loops=1)
         Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
         Workers Planned: 7
         Workers Launched: 7
         ->  Sort  (cost=603302.02..604491.28 rows=475701 width=32) (actual time=1173.668..1253.437 rows=420028 loops=8)
               Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
               Sort Key: t_balances.f_index
               Sort Method: external merge  Disk: 19264kB
               Worker 0:  Sort Method: external merge  Disk: 17032kB
               Worker 1:  Sort Method: external merge  Disk: 16872kB
               Worker 2:  Sort Method: external merge  Disk: 17600kB
               Worker 3:  Sort Method: external merge  Disk: 17024kB
               Worker 4:  Sort Method: external merge  Disk: 16008kB
               Worker 5:  Sort Method: external merge  Disk: 17160kB
               Worker 6:  Sort Method: external merge  Disk: 17472kB
               Worker 0: actual time=1169.451..1251.784 rows=413340 loops=1
               Worker 1: actual time=1175.999..1259.257 rows=409437 loops=1
               Worker 2: actual time=1173.178..1263.187 rows=427156 loops=1
               Worker 3: actual time=1177.141..1259.815 rows=413216 loops=1
               Worker 4: actual time=1170.905..1250.253 rows=388519 loops=1
               Worker 5: actual time=1169.893..1253.631 rows=416591 loops=1
               Worker 6: actual time=1156.789..1245.918 rows=424138 loops=1
               ->  Hash Semi Join  (cost=457.60..551777.54 rows=475701 width=32) (actual time=88.605..1004.630 rows=420028 loops=8)
                     Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
                     Hash Cond: (t_balances.f_index = t_groups.f_index)
                     Worker 0: actual time=94.414..1007.555 rows=413340 loops=1
                     Worker 1: actual time=100.922..1009.828 rows=409437 loops=1
                     Worker 2: actual time=88.315..1005.567 rows=427156 loops=1
                     Worker 3: actual time=89.170..1008.302 rows=413216 loops=1
                     Worker 4: actual time=90.533..1003.417 rows=388519 loops=1
                     Worker 5: actual time=97.039..1005.639 rows=416591 loops=1
                     Worker 6: actual time=90.639..992.811 rows=424138 loops=1
                     ->  Parallel Index Scan using i_balances_2 on public.t_balances  (cost=0.58..536721.46 rows=3545479 width=32) (actual time=0.035..586.603 rows=3219174 loops=8)
                           Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
                           Index Cond: ((t_balances.f_epoch >= 12041) AND (t_balances.f_epoch < 12265))
                           Worker 0: actual time=0.034..591.033 rows=3165012 loops=1
                           Worker 1: actual time=0.041..586.845 rows=3131763 loops=1
                           Worker 2: actual time=0.034..588.722 rows=3286585 loops=1
                           Worker 3: actual time=0.030..594.133 rows=3166346 loops=1
                           Worker 4: actual time=0.028..591.512 rows=2984664 loops=1
                           Worker 5: actual time=0.046..582.785 rows=3191448 loops=1
                           Worker 6: actual time=0.040..578.070 rows=3246854 loops=1
                     ->  Hash  (cost=269.51..269.51 rows=15001 width=8) (actual time=88.455..88.456 rows=15001 loops=8)
                           Output: t_groups.f_index
                           Buckets: 16384  Batches: 1  Memory Usage: 714kB
                           Worker 0: actual time=94.251..94.252 rows=15001 loops=1
                           Worker 1: actual time=100.726..100.727 rows=15001 loops=1
                           Worker 2: actual time=88.158..88.159 rows=15001 loops=1
                           Worker 3: actual time=89.040..89.041 rows=15001 loops=1
                           Worker 4: actual time=90.416..90.417 rows=15001 loops=1
                           Worker 5: actual time=96.852..96.853 rows=15001 loops=1
                           Worker 6: actual time=90.440..90.442 rows=15001 loops=1
                           ->  Seq Scan on public.t_groups  (cost=0.00..269.51 rows=15001 width=8) (actual time=85.608..87.034 rows=15001 loops=8)
                                 Output: t_groups.f_index
                                 Filter: (t_groups.f_group = 1)
                                 Worker 0: actual time=91.469..92.877 rows=15001 loops=1
                                 Worker 1: actual time=97.515..99.137 rows=15001 loops=1
                                 Worker 2: actual time=85.393..86.765 rows=15001 loops=1
                                 Worker 3: actual time=86.219..87.637 rows=15001 loops=1
                                 Worker 4: actual time=87.570..88.993 rows=15001 loops=1
                                 Worker 5: actual time=93.978..95.456 rows=15001 loops=1
                                 Worker 6: actual time=87.405..88.968 rows=15001 loops=1
 Planning Time: 0.437 ms
 JIT:
   Functions: 112
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 11.216 ms, Inlining 258.319 ms, Optimization 282.158 ms, Emission 143.154 ms, Total 694.848 ms
 Execution Time: 3156.752 ms

Creating a materialized view

Creating a materialized view did help, however refreshing the view takes too long for this to be a useful approach.

Mirroring a subset of the table with triggers

Started on this path and although it works it feels like a very manual and error-prone process, recreating something that should be part of the database.

Best Answer

The simple answer is: give up on this approach. The chances of returning 3.34 million individual rows in PostgreSQL in far less than a second is very small. If you just wanted to do some simple aggregation which could be done in parallel and then return just the aggregated value, that might just barely be possible, but that isn't what you are doing (or at least, not what you showed).

Your biggest improvement here is probably going to be making an index covering all 3 columns so that you can get a index-only scan:

create unique index on t_balances (f_index , f_epoch) include (f_balance);

If IO is not the bottleneck, that might get you a factor of 2 or 3 or so. Not a factor of 10.

You said a Materialized View helped, but you didn't show us what its definition was, or what the plan it produced was, or quantify how much it helped. If we knew that, maybe we could suggest ways to get the same benefit without needing to have the MV.