PostgreSQL poor performing order_by ASC

optimizationperformancepostgresqlquery-performance

I have a situation where I prepare a list of buffers to process by a task queue. Currently my buffer table contains over 350.000 rows.

Every estimated 30 seconds the following query is ran to check what buffers to run:

SELECT "buffer_incomingbuffer"."id"
FROM "buffer_incomingbuffer"

WHERE (
  (
    "buffer_incomingbuffer"."run_date" <= '2019-02-28 08:20:19.243339'
    OR
    "buffer_incomingbuffer"."run_date" IS NULL
  )
  AND "buffer_incomingbuffer"."attempts" < 10
  AND "buffer_incomingbuffer"."completed_date" IS NULL
  AND "buffer_incomingbuffer"."locked_at" IS NULL
  AND "shop_id" = 'any-uuid-here'
)

ORDER BY "buffer_incomingbuffer"."created_date" ASC
LIMIT 10

With the following index:

buffer_incomingbuffer (run_date, attempts, completed_date, locked_at, shop_id, created_date)

This query takes over 500ms, whereas changing the ORDER BY into a DESC will take around 0.5ms.

Following the EXPLAIN ANALYZE execution of both queryies:

ORDER BY created_date ASC

Limit  (cost=0.42..794.50 rows=10 width=24) (actual time=541.306..541.553 rows=10 loops=1)
  ->  Index Scan using buffer_incomingbuffer_created_date_index on buffer_incomingbuffer  (cost=0.42..146110.50 rows=1840 width=24) (actual time=541.289..541.400 rows=10 loops=1)
        Filter: ((completed_date IS NULL) AND (locked_at IS NULL) AND ((run_date <= '2019-02-28 08:20:19.243339+00'::timestamp with time zone) OR (run_date IS NULL)) AND (attempts < 10) AND (shop_id = '43ca2e21-e1f5-4151-8fa3-c429c638d03d'::uuid))
        Rows Removed by Filter: 244674
Planning time: 0.168 ms
Execution time: 541.688 ms

ORDER BY created_date DESC

Limit  (cost=0.42..794.50 rows=10 width=24) (actual time=0.085..0.391 rows=10 loops=1)
  ->  Index Scan Backward using buffer_incomingbuffer_created_date_index on buffer_incomingbuffer  (cost=0.42..146110.50 rows=1840 width=24) (actual time=0.058..0.192 rows=10 loops=1)
        Filter: ((completed_date IS NULL) AND (locked_at IS NULL) AND ((run_date <= '2019-02-28 08:20:19.243339+00'::timestamp with time zone) OR (run_date IS NULL)) AND (attempts < 10) AND (shop_id = '43ca2e21-e1f5-4151-8fa3-c429c638d03d'::uuid))
Planning time: 0.206 ms
Execution time: 0.545 ms

To me it seems the biggest (for me unexplainable) difference between these two query plans is the "filter" part.

Reason of the order by is because we'd like to get the things that "came in" first, also "go out" first. The current execution time is a hiccup for the application and I am not sure how to tackle this issue. Like to hear if I am missing something!

Best Answer

The problem is that your buffers which have low created_date are selectively depleted in things which match your WHERE clause. So it needs to walk more and more of the index before it accumulates 10 rows which match the WHERE. Presumably the reason it is selectively depleted is that once you get the results of your query, you do something to the rows which cause them to no longer match the WHERE clause next time the query gets run. I'm guessing based on the column names that that thing is setting "completed_date" to something which no longer NULL? If so, you might want to create in index on (completed_date, created_date). Or maybe (shop_id,completed_date, created_date).

The reason that changing the sort order improves the query is that it starts using part of the index not selectively depleted. If you were to adopt that sort order in production, you would find that its performance quickly degrades as that part of the index would become depleted.

You showed us the definition of one index, but clearly that is not your only index, and it not clear what you believe the significance of that index is.