PostgreSQL Performance – How to Optimize Window Queries

index-tuningperformancepostgresqlquery-performancerankwindow functions

I have the following table with approximately 175k records:

    Column     |            Type             |              Modifiers
----------------+-----------------------------+-------------------------------------
 id             | uuid                        | not null default uuid_generate_v4()
 competition_id | uuid                        | not null
 user_id        | uuid                        | not null
 first_name     | character varying(255)      | not null
 last_name      | character varying(255)      | not null
 image          | character varying(255)      |
 country        | character varying(255)      |
 slug           | character varying(255)      | not null
 total_votes    | integer                     | not null default 0
 created_at     | timestamp without time zone |
 updated_at     | timestamp without time zone |
 featured_until | timestamp without time zone |
 image_src      | character varying(255)      |
 hidden         | boolean                     | not null default false
 photos_count   | integer                     | not null default 0
 photo_id       | uuid                        |
Indexes:
    "entries_pkey" PRIMARY KEY, btree (id)
    "index_entries_on_competition_id" btree (competition_id)
    "index_entries_on_featured_until" btree (featured_until)
    "index_entries_on_hidden" btree (hidden)
    "index_entries_on_photo_id" btree (photo_id)
    "index_entries_on_slug" btree (slug)
    "index_entries_on_total_votes" btree (total_votes)
    "index_entries_on_user_id" btree (user_id)

and I'm executing the following query to get the rank of the entry and the slug of the next and previous entry:

WITH entry_with_global_rank AS ( 
  SELECT id
       , rank() OVER w AS global_rank
       , LAG(slug) OVER w AS previous_slug
       , LEAD(slug) OVER w AS next_slug
  FROM entries 
  WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' 
  WINDOW w AS (PARTITION BY competition_id ORDER BY total_votes DESC) 
) 
SELECT * 
FROM entry_with_global_rank 
WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' 
LIMIT 1;

Here is the results from EXPLAIN:

                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=516228.88..516233.37 rows=1 width=88)
   CTE entry_with_global_rank
     ->  WindowAgg  (cost=510596.59..516228.88 rows=250324 width=52)
           ->  Sort  (cost=510596.59..511222.40 rows=250324 width=52)
                 Sort Key: entries.total_votes
                 ->  Seq Scan on entries  (cost=0.00..488150.74 rows=250324 width=52)
                       Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
   ->  CTE Scan on entry_with_global_rank  (cost=0.00..5632.29 rows=1252 width=88)
         Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
(9 rows)

This query is taking ~1400ms; Is there any way to speed this up?

Edit:

Here are the results from EXPLAIN ANALYZE:

                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=516228.88..516233.37 rows=1 width=88) (actual time=1232.824..1232.824 rows=1 loops=1)
   CTE entry_with_global_rank
     ->  WindowAgg  (cost=510596.59..516228.88 rows=250324 width=52) (actual time=1202.101..1226.846 rows=8727 loops=1)
           ->  Sort  (cost=510596.59..511222.40 rows=250324 width=52) (actual time=1202.069..1213.992 rows=8728 loops=1)
                 Sort Key: entries.total_votes
                 Sort Method: quicksort  Memory: 8128kB
                 ->  Seq Scan on entries  (cost=0.00..488150.74 rows=250324 width=52) (actual time=89.970..1174.083 rows=50335 loops=1)
                       Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                       Rows Removed by Filter: 125477
   ->  CTE Scan on entry_with_global_rank  (cost=0.00..5632.29 rows=1252 width=88) (actual time=1232.822..1232.822 rows=1 loops=1)
         Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
         Rows Removed by Filter: 8726
 Total runtime: 1234.424 ms
(13 rows)

Edit 2:

I ran VACUUM ANALYZE on the database and now the query time has improved, although I'm sure there must be some way to improve the performance:

                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=475372.26..475376.76 rows=1 width=88) (actual time=138.388..138.388 rows=1 loops=1)
   CTE entry_with_global_rank
     ->  WindowAgg  (cost=470662.23..475372.26 rows=209335 width=35) (actual time=125.489..132.214 rows=4178 loops=1)
           ->  Sort  (cost=470662.23..471185.56 rows=209335 width=35) (actual time=125.462..126.724 rows=4179 loops=1)
                 Sort Key: entries.total_votes
                 Sort Method: quicksort  Memory: 5510kB
                 ->  Bitmap Heap Scan on entries  (cost=71390.90..452161.77 rows=209335 width=35) (actual time=29.381..87.130 rows=50390 loops=1)
                       Recheck Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                       ->  Bitmap Index Scan on index_entries_on_competition_id  (cost=0.00..71338.56 rows=209335 width=0) (actual time=23.593..23.593 rows=51257 loops=1)
                             Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
   ->  CTE Scan on entry_with_global_rank  (cost=0.00..4710.04 rows=1047 width=88) (actual time=138.387..138.387 rows=1 loops=1)
         Filter: (id = '9470ec4f-fed1-4f95-bbed-1e3dbba5f53b'::uuid)
         Rows Removed by Filter: 4177
 Total runtime: 138.588 ms
(14 rows)

Edit 3:

As requested, the final query plan with the covering index in place, right after a VACUUM ANALYZE:

                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..6771.99 rows=1 width=88) (actual time=46.765..46.765 rows=1 loops=1)
   ->  Subquery Scan on entry_with_global_rank  (cost=0.42..6771.99 rows=1 width=88) (actual time=46.763..46.763 rows=1 loops=1)
         Filter: (entry_with_global_rank.id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
         Rows Removed by Filter: 9128
         ->  WindowAgg  (cost=0.42..5635.06 rows=90955 width=35) (actual time=0.090..40.002 rows=9129 loops=1)
               ->  Index Only Scan using entries_extra_special_idx on entries  (cost=0.42..3815.96 rows=90955 width=35) (actual time=0.071..10.973 rows=9130 loops=1)
                     Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                     Heap Fetches: 166
 Total runtime: 46.867 ms
(9 rows)

Best Answer

The CTE is not needed here and poses as optimization barrier. A plain subquery generally performs better:

SELECT * 
FROM  (
   SELECT id
         ,rank()     OVER w AS global_rank
         ,lag(slug)  OVER w AS previous_slug
         ,lead(slug) OVER w AS next_slug 
   FROM   entries 
   WHERE  competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' 
   WINDOW w AS (ORDER BY total_votes DESC) 
   ) entry_with_global_rank 
WHERE  id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' 
LIMIT  1;

As @Daniel commented, I removed the PARTITION BY clause from the window definition, since you are limiting to a single competition_id anyway.

Table layout

You could optimize your table layout to slightly reduce on-disk storage size, which makes everything a bit faster, yet:

     Column     |            Type             |              Modifiers
----------------+-----------------------------+-------------------------------------
 id             | uuid                        | not null default uuid_generate_v4()
 competition_id | uuid                        | not null
 user_id        | uuid                        | not null
 total_votes    | integer                     | not null default 0
 photos_count   | integer                     | not null default 0
 hidden         | boolean                     | not null default false
 slug           | character varying(255)      | not null
 first_name     | character varying(255)      | not null
 last_name      | character varying(255)      | not null
 image          | character varying(255)      |
 country        | character varying(255)      |
 image_src      | character varying(255)      |
 photo_id       | uuid                        |
 created_at     | timestamp without time zone |
 updated_at     | timestamp without time zone |
 featured_until | timestamp without time zone |

More about that:

Also, do you actually need all those uuid columns? int or bigint won't work for you? Would make table and indexes a bit smaller and everything faster.

And I would just use text for the character data, but that is not going to help performance of the query.

Aside: character varying(255) is almost always pointless in Postgres. Some other RDBMS profit from the restriction of the length, for Postgres it's all the same (unless you actually need to enforce the unlikely max. length of 255 characters).

Special index

Finally, you could build a highly specialized index (only if index maintenance is worth the special casing):

CREATE INDEX entries_special_idx ON entries (competition_id, total_votes DESC, id, slug);

Adding (id, slug) to the index only makes sense if you can get index-only scans out of this. (Disabled autovacuum or lots of concurrent writes would negate that effort.) Else remove the last two columns.

While being at it, audit your indexes. Are they all in use? There might be some dead freight here.