Postgresql – postgres Poor performance on ORDER BY “id” DESC LIMIT 1

performancepostgresql-9.3postgresql-performanceslow-log

I have table items with following schema (in postgres v9.3.5):

  Column   | Type   |                         Modifiers                  | Storage  
-----------+--------+----------------------------------------------------+----------
 id        | bigint | not null default nextval('items_id_seq'::regclass) | plain    
 data      | text   | not null                                           | extended 
 object_id | bigint | not null                                           | plain    
Indexes:
    "items_pkey" PRIMARY KEY, btree (id)
    "items_object_id_idx" btree (object_id)
Has OIDs: no

When I execute query it hangs for a very long time:

SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;

After VACUUM ANALYZE the query execution improved a lot, but still not perfect.

# EXPLAIN ANALYZE SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;
                                                                            QUERY PLAN                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..1269.14 rows=1 width=63) (actual time=873796.061..873796.061 rows=0 loops=1)
   ->  Index Scan Backward using items_pkey on items  (cost=0.44..1164670.11 rows=918 width=63) (actual time=873796.059..873796.059 rows=0 loops=1)
         Filter: (object_id = 123::bigint)
         Rows Removed by Filter: 27942522
 Total runtime: 873796.113 ms
(5 rows)

The strange thing is that when I execute

SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1;

it returns 0 rows and I can do it in my code to optimize my web application's performance, but why it can be done by Postgres itself? I came to Postgres from MySQL and never seen such strange things there.

=====

I have found that it uses different query plan, different index, but why?

# EXPLAIN ANALYZE SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1;
                                                                          QUERY PLAN                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.34 rows=1 width=63) (actual time=0.014..0.014 rows=0 loops=1)
   ->  Index Scan using items_object_id_operation_idx on items  (cost=0.56..2579.16 rows=929 width=63) (actual time=0.013..0.013 rows=0 loops=1)
         Index Cond: (object_id = 123::bigint)
 Total runtime: 0.029 ms
(4 rows)

Best Answer

Trying to explain why there is difference in performance between the two queries.

This one: SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1 is satisfied by any one row with the matching object_id, so the index on object_id is a natural choice. The query requires minimal I/O: index scan to find the first matching value plus one heap read to fetch the entire row.

The alternative: SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1 requires all rows with the matching object_id be sorted by another column, id, then the row with the maximum value of id be returned. If you were to use the index on object_id you would need to perform the following operations: scan the index to find every matching object_id; for every match go fetch the actual row; then sort all fetched rows by id and return the one with the largest id.

The alternative chosen by the optimizer, presumably based on the object_id histogram, is: scan the index on id backwards, in its entirety; for each value go fetch the row and check if the value of object_id matches; return the first matching row, which will have the maximum possible id value. This alternative avoids sorting the rows, so I guess the optimizer prefers it to using the index on object_id.

The presence of an index on (object_id asc, id desc) allows for yet another alternative: scan this new index for the first entry matching the supplied object_id value, which by definition will have the highest id value; go fetch one matching row and return. Obviously, this is the most efficient approach.