How to Optimize Keyset Pagination Query with CTEs on Big Table

ctepagingpostgresql

I tried to document myself as much as I could on the topic before coming here to bother you, but here I am anyway.

We want to implement keyset pagination on this table:

create table api.subscription (
    subscription_id uuid primary key,
    token_id uuid not null,
    product_id uuid not null references api.product(product_id) deferrable,
    spid bigint null,
    attributes_snapshot jsonb not null,
    created_at timestamp not null,
    refreshed_at timestamp,
    enriched_at timestamp null,
    valid_until timestamp not null,
    is_cancelled boolean not null,
    has_been_expired boolean not null,
    has_quality_data boolean not null
);

For that we use this query to prepare the pagination metadata:

with book as (
    select created_at, subscription_id
    from api.subscription
    where token_id = $1
    and refreshed_at >= $2
    and valid_until >= now()
    and not is_cancelled
    and not has_been_expired
),
total as (
    select count(*) as total from book
),
page as (
    select * from book
    where (case when $4 is null then true else (created_at > $4 or (created_at = $4 and subscription_id > $5)) end)
    order by created_at asc, subscription_id asc
    limit $3
),
last_row as (
    select last_value(created_at) over() as last_seen_created_at,
    last_value(subscription_id) over() as last_seen_subscription_id
    from page
),
ids as (
    select array_agg(subscription_id) ids from page
)
select * from last_row, total, ids

This allows us to get total number of items the user is currently paginating, the last seen keys (for next page), and the IDs in the current page (to fetch them later with a IN clause (or any), but that's another story).

The problem is the time it takes when the table contains 19 millions rows:

 Nested Loop  (cost=50000.14..50000.22 rows=1 width=64) (actual time=143.677..143.677 rows=0 loops=1)
   Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id, total.total, ids.ids
   Buffers: shared hit=395 read=24611
   CTE book
     ->  Seq Scan on api.subscription  (cost=0.00..50000.00 rows=1 width=24) (actual time=143.646..143.646 rows=0 loops=1)
           Output: subscription.created_at, subscription.subscription_id
           Filter: ((NOT subscription.is_cancelled) AND (NOT subscription.has_been_expired) AND (subscription.token_id = '73759739-7af5-4d28-91bc-897c959bddcd'::uuid) AND (subscription.valid_until >= now()) AND (subscription.refreshed_at >= (now() - '4 days'::interval)))
           Rows Removed by Filter: 1000000
           Buffers: shared hit=389 read=24611
   CTE total
     ->  Aggregate  (cost=0.02..0.03 rows=1 width=8) (never executed)
           Output: count(*)
           ->  CTE Scan on book  (cost=0.00..0.02 rows=1 width=0) (never executed)
                 Output: book.created_at, book.subscription_id
   CTE page
     ->  Limit  (cost=0.03..0.04 rows=1 width=24) (actual time=143.674..143.674 rows=0 loops=1)
           Output: book_1.created_at, book_1.subscription_id
           Buffers: shared hit=395 read=24611
           ->  Sort  (cost=0.03..0.04 rows=1 width=24) (actual time=143.673..143.673 rows=0 loops=1)
                 Output: book_1.created_at, book_1.subscription_id
                 Sort Key: book_1.created_at, book_1.subscription_id
                 Sort Method: quicksort  Memory: 25kB
                 Buffers: shared hit=395 read=24611
                 ->  CTE Scan on book book_1  (cost=0.00..0.02 rows=1 width=24) (actual time=143.646..143.646 rows=0 loops=1)
                       Output: book_1.created_at, book_1.subscription_id
                       Buffers: shared hit=389 read=24611
   CTE last_row
     ->  WindowAgg  (cost=0.00..0.03 rows=1 width=24) (actual time=143.675..143.675 rows=0 loops=1)
           Output: last_value(page.created_at) OVER (?), last_value(page.subscription_id) OVER (?)
           Buffers: shared hit=395 read=24611
           ->  CTE Scan on page  (cost=0.00..0.02 rows=1 width=24) (actual time=143.674..143.674 rows=0 loops=1)
                 Output: page.created_at, page.subscription_id
                 Buffers: shared hit=395 read=24611
   CTE ids
     ->  Aggregate  (cost=0.02..0.03 rows=1 width=32) (never executed)
           Output: array_agg(page_1.subscription_id)
           ->  CTE Scan on page page_1  (cost=0.00..0.02 rows=1 width=16) (never executed)
                 Output: page_1.created_at, page_1.subscription_id
   ->  Nested Loop  (cost=0.00..0.05 rows=1 width=32) (actual time=143.677..143.677 rows=0 loops=1)
         Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id, total.total
         Buffers: shared hit=395 read=24611
         ->  CTE Scan on last_row  (cost=0.00..0.02 rows=1 width=24) (actual time=143.676..143.676 rows=0 loops=1)
               Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id
               Buffers: shared hit=395 read=24611
         ->  CTE Scan on total  (cost=0.00..0.02 rows=1 width=8) (never executed)
               Output: total.total
   ->  CTE Scan on ids  (cost=0.00..0.02 rows=1 width=32) (never executed)
         Output: ids.ids
 Planning time: 0.580 ms
 Execution time: 143.786 ms

Note: this is an example with only 1M rows, but after 19M rows it takes more than 20 seconds.

We tried a lot of different setups of indexes, without luck:

Indexes:
    "subscription_pkey" PRIMARY KEY, btree (subscription_id)
    "has_been_expired_idx" btree (has_been_expired)
    "is_cancelled_idx" btree (is_cancelled)
    "pagination_idx" btree (token_id, refreshed_at, valid_until, is_cancelled, has_been_expired)
    "refreshed_at_idx" btree (refreshed_at)
    "token_and_created_at_idx" btree (token_id, created_at)
    "token_and_refreshed_at_idx" btree (token_id, refreshed_at)
    "token_id_idx" btree (token_id)
    "valid_until_idx" btree (valid_until)
Foreign-key constraints:
    "subscription_product_id_fkey" FOREIGN KEY (product_id) REFERENCES api.product(product_id) DEFERRABLE

So my question is:

  • is there a way to improve this query by finding the correct indexes ?
  • is it a good approach anyway ?

Any hints would be welcome 🙂 Hope my question makes sense.

Thanks for taking time to read!

Best Answer

without total it can be rewritten as

with page as (
    select created_at, subscription_id
    from api.subscription
    where token_id = $1
    and refreshed_at >= $2
    and valid_until >= now()
    and not is_cancelled
    and not has_been_expired
    and (case when $4 is null then true else (created_at > $4 or (created_at = $4 and subscription_id > $5)) end)
    order by created_at asc, subscription_id asc
    limit $3
),
last_row as (
    select last_value(created_at) over() as last_seen_created_at,
    last_value(subscription_id) over() as last_seen_subscription_id
    from page
),
ids as (valid_until
    select array_agg(subscription_id) ids from page
)
select * from last_row, ids

a compound index on is_cancelled,has_been_expired,token_id and one of refreshed_at,valid_until or created_at will help.