I work with a table which consists of 1.5B records.
DB: RDS, PostgreSQL 12.4, 16GB RAM, 4vCPU
Schema:
CREATE TABLE public.trip (
id bigint NOT NULL,
cell_to character varying NOT NULL,
cell_from character varying NOT NULL,
indicator character varying NOT NULL,
time_id integer,
weight double precision
);
CREATE INDEX ix_trip_cell_from ON public.trip USING btree (cell_from);
CREATE INDEX ix_trip_cell_to ON public.trip USING btree (cell_to);
CREATE INDEX ix_trip_indicator ON public.trip USING btree (indicator);
CREATE INDEX ix_trip_time_id ON public.trip USING btree (time_id);
I'm trying to pull all trips which happen within some cells (output around 7-12 million records):
EXPLAIN ANALYZE SELECT
cell_to,
cell_from,
time_id,
weight AS trips
FROM
trip
WHERE
cell_to IN (VALUES ... 1k values)
AND
cell_from IN (VALUES ... 1k values (the same as above))
AND
time_id IN (VALUES ... 3 to 20 values)
AND
indicator = 'some string';
The result you can find here https://explain.depesz.com/s/RxH4.
What I've tried:
- Replaced with INNER JOINs -> got some improvements
- Changed b-tree index to BRIN -> a bit improved timing
- VACUUM, REINDEX, work_mem -> zero effect
The query still runs too long.
UPDATE:
Thanks to @NikitaSerbskiy and @Laurenz Albe, forcing postgresql using index and adding a multicolumn index helped a lot.
Best Answer
You might get bitmap index scans and better performance if you increase
work_mem
to something like 200MB or more, so that a bitmap for your table fits into it.Other than that, the only remedy I can see is using more parallel workers by raising
max_parallel_workers_per_gather
.But all these optimizations are questionable if you plan to run more than a single concurrent query on this tiny machine.
UPDATE:
Experiments with
enable_seqscan = off
suggest that PostgreSQL overestimates the cost of an index scan.So if you lower
random_page_cost
to something closer to 1, PostgreSQL should choose the better plan automatically.