slava=# explain (analyze) SELECT * from pending_posts WHERE user_id <> 1456
slava-# ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on pending_posts (cost=0.00..17906.00 rows=999994 width=14) (actual time=0.021..231.803 rows=999994 loops=1)
Filter: (user_id <> 1456)
Rows Removed by Filter: 6
Planning time: 0.108 ms
Execution time: 289.845 ms
(5 rows)
slava=# explain (analyze) select * from pending_posts where user_id <> 1456 order_by user_id;
ERROR: syntax error at or near "order_by"
LINE 1: ...select * from pending_posts where user_id <> 1456 order_by u...
^
slava=# explain (analyze) select * from pending_posts where user_id <> 1456 order by user_id;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Index Scan using p_i on pending_posts (cost=0.42..50103.17 rows=999994 width=14) (actual time=0.033..899.520 rows=999994 loops=1)
Filter: (user_id <> 1456)
Rows Removed by Filter: 6
Planning time: 0.141 ms
Execution time: 953.982 ms
(5 rows)
Given that user_id
is integer indexed with btree, should't it be faster to sort and then select using tree search?
Best Answer
PostgreSQL will evaluate all (or a high number) of possible query plans, and make a good guess at which one will be the least costly. If the different cost parameters (that depend, for instance, on the speed of your disks; the amount of available memory, etc.) are well set, most of the times that guess will be accurate and the choice of plans will be optimum.
You're asking PostgreSQL two different queries. One doesn't need sorting, the other one does. The rows returned are the same, because the rest of the query is the same. The query with the
ORDER BY
takes longer. That is to be expected.PostgreSQL could have decided that performing a
Seq Scan
then aSort
would be quicker than performing anIndex Scan
. With the actual settings, it decided the opposite.Performing an
Index Scan
takes longer than performing aSeq Scan
just because, when scanning the index, PostgreSQL can just skip the entries with(user_id <> 1456)
(the info foruser_id
is in the index). For the non-skipped values, it gets the (internal) reference to the corresponding row (which is also part of the index), and it has to retrieve the actual row in the table. It won't need to sort them, because by following the index, the values are already in the proper order.With standard settings, the costs for for
seq_page_cost
andrandom_page_cost
(in arbitrary units) are 4 and 1. That takes into account that, with HDD, jumping from one page to another needs a significant mechanical movement, and needs more time than reading pages one after next. You can assume aSeq Scan
will read pages sequentially, and anIndex Scan
will mostly jump from one to another. [If you use SSD instead of HDD, the value forrandom_page_cost
should probably be put to just 1.05].If you would be using a covering index (one index that contains at least all the columns that are in your
SELECT
), then, PostgreSQL would be able to perform anIndex Only Scan
(meaning it doesn't have to retrieve the whole rows from the actual *table). That would speed up your query significantly. For a covering index to work, your table has to be properly maintained (i.e.: autovacuum must have had time to maintain it).Try this query and see what the execution plan is:
If you don't get an
Index Only Scan
, perform a:and retry it.
Note that indexes can help you in several areas:
SELECT
clause if they are covering indexes. If the indexes contain all the columns specified,Index Only Scans
are possible.WHERE
clause if they allow to filter out (or go directly to) specific) rows.ORDER BY
clause, if they provide already the rows in the sort order you specify.GROUP BY
, if the aggregation is done by sorting the rows according to the groups (other techniques are possible).References: