If you don't need the "optimal fill", you could use a window function and something like this:
select *
from (
select id, name, price, category,
sum(price) over (order by price, date_created) as so_far
from my_table
where price > 40
and category = 'buy'
) t
where so_far <= 12;
If you are mixing "groups" of orders, you might need a partition by
clause as well.
Change the order by
inside the window function to meet your ordering criteria.
Firs the displayed query plan says:
Limit (cost=0.43..44.98 rows=17 width=553) ...
Obviously you added a LIMIT
clause to the query, which is not in the query you show. If you didn't do the same in you second test, you may already have your answer: The same query with LIMIT
(and without ORDER BY
) can stop as soon as enough rows are found and can be much faster. Can also lead to a different query plan to begin with.
In principle
EXPLAIN ANALYZE
measures the time spent by the Postgres process. Measurement is taken on the server. It adds a bit of overhead as compared to just executing the SQL command alone. For very cheap commands, that overhead may be more than the query cost. Overhead also depends on used options to some degree. You can minimize overhead with EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
. (The TIMING
option of EXPLAIN
is in no way related to psql's \timing
; used to toggle additional measurements per plan node.)
The manual:
The ANALYZE
option causes the statement to be actually executed, not
only planned. Then actual run time statistics are added to the
display, including the total elapsed time expended within each plan
node (in milliseconds) and the total number of rows it actually returned.
About the TIMING
option:
Include actual startup time and time spent in each node in the output.
The overhead of repeatedly reading the system clock can slow down the
query significantly on some systems, so it may be useful to set this
parameter to FALSE
when only actual row counts, and not exact times,
are needed. Run time of the entire statement is always measured, even
when node-level timing is turned off with this option. This parameter
may only be used when ANALYZE
is also enabled. It defaults to TRUE
.
\timing
in psql, on the other hand, measures the time between sending the command and receiving results - including network latency. Measurement is taken on the client. That can be many times the execution alone. Seems to be your case.
Due to the mentioned overhead above, the opposite might be observed in cases with low latency and minimal data volume sent over the wire (especially with psql executed on the same server): \timing
can report faster times.
There is a third good option: use the shell command time
(also measuring on the client, obviously). See:
Asides:
When I actually execute it (with \timing
) ...
You are aware that EXPLAIN ANALYZE
does actually execute the given query, with all side effects - as opposed to just EXPLAIN
.
I tried to programmatically loop over the ids and run a separate query for each and it produces better results (around 1.5s)
A loop can produce query plans customized to each actual input value, based on statistics for the value. While a large IN
or a large input array will produce a plan based on generic estimates. See:
You can achieve similar effects as with your loop with a LATERAL
join from a (derived) table or with many SELECT
statements tied together with UNION ALL
or with a loop in a (plpgsql) server-side function - without adding many times the network latency and/or planing overhead. Related:
Best Answer
ts_rank_cd
yields areal
. If you cast this todouble precision
, the result becomes:That explains what you observe, because the
real
is cast todouble precision
if you compare it to adouble precision
number.The solution is to explicitly cast 0.1 to
real
, so thatreal
comparison is used: