I have a relatively simple table:
CREATE TABLE t_balances (
f_index BIGINT NOT NULL
,f_epoch BIGINT NOT NULL
,f_balance BIGINT NOT NULL
);
CREATE UNIQUE INDEX i_balances_1 ON t_balances(f_index, f_epoch);
This has over 100,000 individual f_index
values, with each having tens of thousands of entries on f_epoch
. (f_index,f_epoch)
is unique (as per the index).
I am attempting to obtain balances for a subset of the indices over a given epoch range, for example "indices 1, 2 and 3 for epochs 100 to 200", however am struggling with finding a suitable index to provide acceptable performance when requesting data for a significant subset of the indices. For example, if I attempt to fetch over a small epoch range for 15,000 indices (out of a total of ~110,000) with the query:
EXPLAIN ANALYZE VERBOSE SELECT *
FROM t_balances
WHERE f_epoch >= 12041
AND f_epoch < 12265
AND f_index IN(1,2,3,...15000)
ORDER BY f_index, f_epoch;
...
Index Scan using i_balances_1 on public.t_balances (cost=0.58..3317610.88 rows=3414744 width=32) (actual time=1.287..2932.709 rows=3342304 loops=1)
Output: f_index, f_epoch, f_balance
Index Cond: ((t_balances.f_index = ANY ('{0,1,2,3...15000}'::bigint[])) AND (t_balances.f_epoch >= 12041) AND (t_balances.f_epoch < 12265))
Planning Time: 20.451 ms
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.210 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 0.210 ms
Execution Time: 3020.610 ms
I need to speed this up at least one order of magnitude, but am struggling to find something that manages this whilst remaining usable in day-to-day operations. I have full control of the schema, and can change the format of the table if required. The would ideally work with PostgreSQL 12, although if 13 is required then so be it.
The server on which the query is running is pretty fast (Ryzen 9 3900, 128GB memory, NVMe SSDs) so I don't think I'm in a situation where the query is as fast as it can be and constrained by the hardware.
For reference, I include some of my attempts to date:
Indexing
I tried various indices, to no effect. May be that there is some other index type that would work better here, but from my reading of the index literature it doesn't appear so.
Adding an index on t_balances(f_epoch)
is ignored by the above query.
Creating a view
I tried creating a view. The list of indices required for queries relatively static, but mutable, so I could not hard code the values in to the index definition. I ended up with a simple lookup table
CREATE TABLE t_groups (
f_group BIGINT NOT NULL
,f_index BIGINT NOT NULL
);
and created the view with
CREATE VIEW v_balances_group1
AS
SELECT *
FROM t_balances
WHERE f_index IN (SELECT f_index FROM t_groups WHERE f_group = 1) ORDER BY f_index;
This did not result in any speedup:
EXPLAIN ANALYZE VERBOSE SELECT *
FROM v_balances_group1
WHERE f_epoch >= 12041
AND f_epoch < 12265
ORDER BY f_index, f_epoch;
...
Sort (cost=1500782.79..1509107.56 rows=3329907 width=32) (actual time=2778.815..3055.376 rows=3360224 loops=1)
Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
Sort Key: t_balances.f_index, t_balances.f_epoch
Sort Method: external merge Disk: 138128kB
-> Gather Merge (cost=604302.14..1013405.00 rows=3329907 width=32) (actual time=1196.396..1601.870 rows=3360224 loops=1)
Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
Workers Planned: 7
Workers Launched: 7
-> Sort (cost=603302.02..604491.28 rows=475701 width=32) (actual time=1173.668..1253.437 rows=420028 loops=8)
Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
Sort Key: t_balances.f_index
Sort Method: external merge Disk: 19264kB
Worker 0: Sort Method: external merge Disk: 17032kB
Worker 1: Sort Method: external merge Disk: 16872kB
Worker 2: Sort Method: external merge Disk: 17600kB
Worker 3: Sort Method: external merge Disk: 17024kB
Worker 4: Sort Method: external merge Disk: 16008kB
Worker 5: Sort Method: external merge Disk: 17160kB
Worker 6: Sort Method: external merge Disk: 17472kB
Worker 0: actual time=1169.451..1251.784 rows=413340 loops=1
Worker 1: actual time=1175.999..1259.257 rows=409437 loops=1
Worker 2: actual time=1173.178..1263.187 rows=427156 loops=1
Worker 3: actual time=1177.141..1259.815 rows=413216 loops=1
Worker 4: actual time=1170.905..1250.253 rows=388519 loops=1
Worker 5: actual time=1169.893..1253.631 rows=416591 loops=1
Worker 6: actual time=1156.789..1245.918 rows=424138 loops=1
-> Hash Semi Join (cost=457.60..551777.54 rows=475701 width=32) (actual time=88.605..1004.630 rows=420028 loops=8)
Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
Hash Cond: (t_balances.f_index = t_groups.f_index)
Worker 0: actual time=94.414..1007.555 rows=413340 loops=1
Worker 1: actual time=100.922..1009.828 rows=409437 loops=1
Worker 2: actual time=88.315..1005.567 rows=427156 loops=1
Worker 3: actual time=89.170..1008.302 rows=413216 loops=1
Worker 4: actual time=90.533..1003.417 rows=388519 loops=1
Worker 5: actual time=97.039..1005.639 rows=416591 loops=1
Worker 6: actual time=90.639..992.811 rows=424138 loops=1
-> Parallel Index Scan using i_balances_2 on public.t_balances (cost=0.58..536721.46 rows=3545479 width=32) (actual time=0.035..586.603 rows=3219174 loops=8)
Output: t_balances.f_index, t_balances.f_epoch, t_balances.f_balance
Index Cond: ((t_balances.f_epoch >= 12041) AND (t_balances.f_epoch < 12265))
Worker 0: actual time=0.034..591.033 rows=3165012 loops=1
Worker 1: actual time=0.041..586.845 rows=3131763 loops=1
Worker 2: actual time=0.034..588.722 rows=3286585 loops=1
Worker 3: actual time=0.030..594.133 rows=3166346 loops=1
Worker 4: actual time=0.028..591.512 rows=2984664 loops=1
Worker 5: actual time=0.046..582.785 rows=3191448 loops=1
Worker 6: actual time=0.040..578.070 rows=3246854 loops=1
-> Hash (cost=269.51..269.51 rows=15001 width=8) (actual time=88.455..88.456 rows=15001 loops=8)
Output: t_groups.f_index
Buckets: 16384 Batches: 1 Memory Usage: 714kB
Worker 0: actual time=94.251..94.252 rows=15001 loops=1
Worker 1: actual time=100.726..100.727 rows=15001 loops=1
Worker 2: actual time=88.158..88.159 rows=15001 loops=1
Worker 3: actual time=89.040..89.041 rows=15001 loops=1
Worker 4: actual time=90.416..90.417 rows=15001 loops=1
Worker 5: actual time=96.852..96.853 rows=15001 loops=1
Worker 6: actual time=90.440..90.442 rows=15001 loops=1
-> Seq Scan on public.t_groups (cost=0.00..269.51 rows=15001 width=8) (actual time=85.608..87.034 rows=15001 loops=8)
Output: t_groups.f_index
Filter: (t_groups.f_group = 1)
Worker 0: actual time=91.469..92.877 rows=15001 loops=1
Worker 1: actual time=97.515..99.137 rows=15001 loops=1
Worker 2: actual time=85.393..86.765 rows=15001 loops=1
Worker 3: actual time=86.219..87.637 rows=15001 loops=1
Worker 4: actual time=87.570..88.993 rows=15001 loops=1
Worker 5: actual time=93.978..95.456 rows=15001 loops=1
Worker 6: actual time=87.405..88.968 rows=15001 loops=1
Planning Time: 0.437 ms
JIT:
Functions: 112
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 11.216 ms, Inlining 258.319 ms, Optimization 282.158 ms, Emission 143.154 ms, Total 694.848 ms
Execution Time: 3156.752 ms
Creating a materialized view
Creating a materialized view did help, however refreshing the view takes too long for this to be a useful approach.
Mirroring a subset of the table with triggers
Started on this path and although it works it feels like a very manual and error-prone process, recreating something that should be part of the database.
Best Answer
The simple answer is: give up on this approach. The chances of returning 3.34 million individual rows in PostgreSQL in far less than a second is very small. If you just wanted to do some simple aggregation which could be done in parallel and then return just the aggregated value, that might just barely be possible, but that isn't what you are doing (or at least, not what you showed).
Your biggest improvement here is probably going to be making an index covering all 3 columns so that you can get a index-only scan:
If IO is not the bottleneck, that might get you a factor of 2 or 3 or so. Not a factor of 10.
You said a Materialized View helped, but you didn't show us what its definition was, or what the plan it produced was, or quantify how much it helped. If we knew that, maybe we could suggest ways to get the same benefit without needing to have the MV.