Postgresql – Efficient pagination for big tables

indexpagingperformancepostgresqlpostgresql-10query-performance

Using PostgreSQL 10.5. I'm trying to create a pagination system where the user can go back and forth between various of results.

In an attempt to not use OFFSET, I pass the id from the last row in the previous page in a parameter called p (prevId). I then select the first three rows whose id is higher than the number passed in the p parameter. (as described in this article)

For example, if the id for the last row in the previous page was 5, I'd select the first 3 rows with an id is higher than 5:

SELECT 
  id, 
  firstname, 
  lastname 
FROM 
  people 
WHERE 
  firstname = 'John'
  AND id > 5 
ORDER BY 
  ID ASC 
LIMIT 
  3;

This works great and the timing isn't very bad either:

Limit  (cost=0.00..3.37 rows=3 width=17) (actual time=0.046..0.117 rows=3 loops=1)
   ->  Seq Scan on people  (cost=0.00..4494.15 rows=4000 width=17) (actual time=0.044..0.114 rows=3 loops=1)
         Filter: ((id > 5) AND (firstname = 'John'::text))
         Rows Removed by Filter: 384
 Planning time: 0.148 ms
 Execution time: 0.147 ms

Although, if the user, on the other hand, would like to return to the previous page, things look a bit different:

First, I'd pass the id for the first row and then put minus sign in front of it to indicate that I should select the rows with an id that's less than (a positive) p parameter. Namely, if the id for the first row is 6, the p parameter would be -6. Similarly, my query would look like the following:

SELECT 
  * 
FROM 
  (
    SELECT 
      id, 
      firstname, 
      lastname 
    FROM 
      people 
    WHERE 
      firstname = 'John' 
      AND id < 6 
    ORDER BY 
      id DESC 
    LIMIT 
      3
  ) as d 
ORDER BY 
  id ASC;

In the query above, I first select the last 3 rows with an id that's less than 6 and then reverse them in order to present them in the same way as the first query described in the beginning.

This works as it should, but since the database is going through almost all my rows, the performance is suffering:

Sort  (cost=4252.75..4252.76 rows=1 width=17) (actual time=194.464..194.464 rows=0 loops=1)
   Sort Key: people.id
   Sort Method: quicksort  Memory: 25kB
   ->  Limit  (cost=4252.73..4252.73 rows=1 width=17) (actual time=194.460..194.460 rows=0 loops=1)
         ->  Sort  (cost=4252.73..4252.73 rows=1 width=17) (actual time=194.459..194.459 rows=0 loops=1)
               Sort Key: people.id DESC
               Sort Method: quicksort  Memory: 25kB
               ->  Gather  (cost=1000.00..4252.72 rows=1 width=17) (actual time=194.448..212.010 rows=0 loops=1)
                     Workers Planned: 1
                     Workers Launched: 1
                     ->  Parallel Seq Scan on people  (cost=0.00..3252.62 rows=1 width=17) (actual time=18.132..18.132 rows=0 loops=2)
                           Filter: ((id < 13) AND (firstname = 'John'::text))
                           Rows Removed by Filter: 100505
Planning time: 0.116 ms
Execution time: 212.057 ms

With this being said, I appreciate that you have taken the time to read this far and my question is, how can I make the pagination more efficient?

Best Answer

The key to performance is a matching multicolumn index of the form:

CREATE UNIQUE INDEX ON people (firstname, id);

UNIQUE, since sort order can be ambiguous without it, and you can get arbitrary results from peers.

A UNIQUE or PRIMARY KEY constraint serves as well.

While the first column is checked for equality like in your example (or sorted in the same direction as the query), this index is good for paging up and down, though it's a bit better for paging up.

With the index in place (and after running ANALYZE on the table), you won't see sequential scans any more (unless your table is small). The database is not "going through almost all your rows" any more.

Read the fine presentation by Markus Winand you are linking to.

If you want paging across more than one firstname, use ROW values. Example for paging down:

SELECT *
FROM  (
   SELECT id, firstname, lastname 
   FROM   people
   WHERE  (firstname, id) < ('John', 6)  -- ROW values
   ORDER  BY firstname DESC, id DESC 
   LIMIT  3
   ) d 
ORDER BY firstname, id;

Related:

If the SELECT list only adds lastname like in your example, you might try to add that column to the index to get index-only scans out of it:

CREATE UNIQUE INDEX ON people (firstname, id, lastname);

Index expressions in this order.

The upcoming Postgres 11 allows indexes to INCLUDE columns, which results in smaller index size, better performance and is applicable in more situations. Like:

CREATE UNIQUE INDEX ON people (firstname, id) INCLUDE (lastname);