I have a table with columns: id, antenna_id, latitude, longitude. There are two composite indexes on (antenna_id, latitude) and (antenna_id, longitude). When I do a max(latitude) for a specific antenna id(s), the speed is acceptable, but doing a min and max for both latitude and longitude at the same time is very slow.
Using PostgreSQL 12.3
Query
EXPLAIN (analyze, buffers, format text)
SELECT max(latitude)
FROM packets
WHERE antenna_id IN (1,2)
Finalize Aggregate (cost=443017.21..443017.22 rows=1 width=32) (actual time=4373.679..4373.679 rows=1 loops=1)
Buffers: shared hit=10812 read=16887
-> Gather (cost=443017.10..443017.21 rows=1 width=32) (actual time=4373.412..4389.032 rows=2 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=10812 read=16887
-> Partial Aggregate (cost=442017.10..442017.11 rows=1 width=32) (actual time=4313.576..4313.577 rows=1 loops=2)
Buffers: shared hit=10809 read=16887
-> Parallel Index Only Scan using idx_packets_antenna_id_latitude on packets (cost=0.57..433527.51 rows=3395835 width=7) (actual time=0.375..3435.488 rows=2201866 loops=2)
Index Cond: (antenna_id = ANY ('{1,2}'::integer[]))
Heap Fetches: 0
Buffers: shared hit=10809 read=16887
Planning Time: 5.992 ms
JIT:
Functions: 8
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 6.236 ms, Inlining 0.000 ms, Optimization 1.549 ms, Emission 32.058 ms, Total 39.842 ms
Execution Time: 4706.406 ms
The explain looks almost identical for max(longitude), min(latitude) and min(longitude) on their own. Speed is acceptable.
But when I combine the queries
SELECT max(latitude), max(longitude), min(latitude), min(longitude)
FROM packets
WHERE antenna_id IN (1,2)
Duration
[2021-03-06 09:28:30] 1 row retrieved starting from 1 in 5 m 35 s 907 ms (execution: 5 m 35 s 869 ms, fetching: 38 ms)
Explain
Finalize Aggregate (cost=3677020.18..3677020.19 rows=1 width=128)
-> Gather (cost=3677020.06..3677020.17 rows=1 width=128)
Workers Planned: 1
-> Partial Aggregate (cost=3676020.06..3676020.07 rows=1 width=128)
-> Parallel Seq Scan on packets (cost=0.00..3642080.76 rows=3393930 width=14)
Filter: (antenna_id = ANY ('{1,2}'::integer[]))
JIT:
Functions: 7
Options: Inlining true, Optimization true, Expressions true, Deforming true
EXPLAIN (analyze, buffers, format text)
SELECT max(latitude), max(longitude), min(latitude), min(longitude)
FROM packets
WHERE antenna_id IN (1,2)
has been running for 24h now, and isn't done yet
Indexes
create index idx_packets_antenna_id_time
on packets (antenna_id, time);
create index idx_packets_antenna_id_longitude
on packets (antenna_id, longitude);
create index idx_packets_device_id_time
on packets (device_id, time);
create index idx_packets_antenna_id_latitude
on packets (antenna_id, latitude);
Data stats
select count(*) from packets
136758098
select count(distinct (antenna_id)) from packets
17558
select antenna_id, count(*) as records
from packets
where antenna_id in (1,2)
group by antenna_id
order by records desc
1,4361049
2,42683
Question
Why does the second query which does the min and max on the latitude and longitude field not use the indexes? And how can I rewrite the query so that it is faster?
Best Answer
Let's create some test data. Looks like your query has about 1% of rows per antenna_id so let's replicate this.
This is really slow. Let's try for one antenna_id.
That's the correct plan, that uses the multicolumn index to compute the max and min. This requires only 1 index lookup per min() or max(), because
is equivalent to
...which is optimizable using an index which contains the rows in pre-sorted order.
The above optimization of max() and min() is basically syntactic sugar, it turns the query into an ORDER BY+LIMIT and puts that into an InitPlan in order to use the index.
But, apparently, it doesn't do this when querying several antenna_ids using "WHERE IN()". Adding a "GROUP BY aid" at the end of the first query doesn't help.
So... Let's query the antenna_ids one at a time then.
It makes a nested loop over the VALUES, and inside the nested loop is the above fast query. It returns the max() and min() for each antenna_id, so to get the global max() and min() you have to wrap that in a subquery and apply max() and min() over the result.
This shouldn't take more than a millisecond unless there is another problem.
Replacing the VALUES above with a generate_series(1,100) that gets the max for the 100 aid's in the table takes about 5ms. Doing it the old fashioned way:
takes about 100x longer.