I want to fetch up to 100 rows based on their id. The id is the primary key of the table.
The query that I had written looks like this:
select * from table where id = any ($1);
where $1
is interpolated as an array of ids.
When using EXPLAIN ANALYZE
I get the following plan (explain link):
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..44.98 rows=17 width=553) (actual time=100.048..834.209 rows=17 loops=1)
-> Index Scan using instagram_id_index_1000 on profiles_1000 (cost=0.43..44.98 rows=17 width=553) (actual time=100.046..834.163 rows=17 loops=1)
Index Cond: (id = ANY ('{34491540,28977916,33241270,33609141,31043380,29364420,30247037,33311491,36267571,32886281,32366574,32569254,33038689,31089076,29416100,30455309,31570597}'::integer[]))
Planning time: 424.512 ms
Execution time: 834.280 ms
(5 rows)
When I actually execute it (with \timing
) I get results in the 2-5 seconds range! I really cannot wrap my mind around such abysmal performance. The execution time provided by EXPLAIN ANALYZE
was already high in the first place.
Some context:
1) the database is local, so there is no network latency
2) the table I'm querying on is a materialized view
3) I also tried the where id in (...)
variation and nothing changed
4) I tried to programmatically loop over the ids and run a separate query for each and it produces better results (around 1.5s)
Is there something that can be done here? I cannot believe that Postgres' performance in my case is this bad. The server also has 8 cores, is it possible to parallelize this query as much as possible?
Best Answer
Firs the displayed query plan says:
Obviously you added a
LIMIT
clause to the query, which is not in the query you show. If you didn't do the same in you second test, you may already have your answer: The same query withLIMIT
(and withoutORDER BY
) can stop as soon as enough rows are found and can be much faster. Can also lead to a different query plan to begin with.In principle
EXPLAIN ANALYZE
measures the time spent by the Postgres process. Measurement is taken on the server. It adds a bit of overhead as compared to just executing the SQL command alone. For very cheap commands, that overhead may be more than the query cost. Overhead also depends on used options to some degree. You can minimize overhead withEXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
. (TheTIMING
option ofEXPLAIN
is in no way related to psql's\timing
; used to toggle additional measurements per plan node.)The manual:
About the
TIMING
option:\timing
in psql, on the other hand, measures the time between sending the command and receiving results - including network latency. Measurement is taken on the client. That can be many times the execution alone. Seems to be your case.Due to the mentioned overhead above, the opposite might be observed in cases with low latency and minimal data volume sent over the wire (especially with psql executed on the same server):
\timing
can report faster times.There is a third good option: use the shell command
time
(also measuring on the client, obviously). See:Asides:
You are aware that
EXPLAIN ANALYZE
does actually execute the given query, with all side effects - as opposed to justEXPLAIN
.A loop can produce query plans customized to each actual input value, based on statistics for the value. While a large
IN
or a large input array will produce a plan based on generic estimates. See:You can achieve similar effects as with your loop with a
LATERAL
join from a (derived) table or with manySELECT
statements tied together withUNION ALL
or with a loop in a (plpgsql) server-side function - without adding many times the network latency and/or planing overhead. Related: