Using PostgreSQL 10.5. I'm trying to create a pagination system where the user can go back and forth between various of results.
In an attempt to not use OFFSET
, I pass the id
from the last row in the previous page in a parameter called p
(prevId). I then select the first three rows whose id
is higher than the number passed in the p
parameter. (as described in this article)
For example, if the id
for the last row in the previous page was 5, I'd select the first 3 rows with an id
is higher than 5:
SELECT
id,
firstname,
lastname
FROM
people
WHERE
firstname = 'John'
AND id > 5
ORDER BY
ID ASC
LIMIT
3;
This works great and the timing isn't very bad either:
Limit (cost=0.00..3.37 rows=3 width=17) (actual time=0.046..0.117 rows=3 loops=1)
-> Seq Scan on people (cost=0.00..4494.15 rows=4000 width=17) (actual time=0.044..0.114 rows=3 loops=1)
Filter: ((id > 5) AND (firstname = 'John'::text))
Rows Removed by Filter: 384
Planning time: 0.148 ms
Execution time: 0.147 ms
Although, if the user, on the other hand, would like to return to the previous page, things look a bit different:
First, I'd pass the id
for the first row and then put minus sign in front of it to indicate that I should select the rows with an id
that's less than (a positive) p
parameter. Namely, if the id
for the first row is 6, the p
parameter would be -6
. Similarly, my query would look like the following:
SELECT
*
FROM
(
SELECT
id,
firstname,
lastname
FROM
people
WHERE
firstname = 'John'
AND id < 6
ORDER BY
id DESC
LIMIT
3
) as d
ORDER BY
id ASC;
In the query above, I first select the last 3 rows with an id
that's less than 6 and then reverse them in order to present them in the same way as the first query described in the beginning.
This works as it should, but since the database is going through almost all my rows, the performance is suffering:
Sort (cost=4252.75..4252.76 rows=1 width=17) (actual time=194.464..194.464 rows=0 loops=1)
Sort Key: people.id
Sort Method: quicksort Memory: 25kB
-> Limit (cost=4252.73..4252.73 rows=1 width=17) (actual time=194.460..194.460 rows=0 loops=1)
-> Sort (cost=4252.73..4252.73 rows=1 width=17) (actual time=194.459..194.459 rows=0 loops=1)
Sort Key: people.id DESC
Sort Method: quicksort Memory: 25kB
-> Gather (cost=1000.00..4252.72 rows=1 width=17) (actual time=194.448..212.010 rows=0 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Parallel Seq Scan on people (cost=0.00..3252.62 rows=1 width=17) (actual time=18.132..18.132 rows=0 loops=2)
Filter: ((id < 13) AND (firstname = 'John'::text))
Rows Removed by Filter: 100505
Planning time: 0.116 ms
Execution time: 212.057 ms
With this being said, I appreciate that you have taken the time to read this far and my question is, how can I make the pagination more efficient?
Best Answer
The key to performance is a matching multicolumn index of the form:
UNIQUE
, since sort order can be ambiguous without it, and you can get arbitrary results from peers.A
UNIQUE
orPRIMARY KEY
constraint serves as well.While the first column is checked for equality like in your example (or sorted in the same direction as the query), this index is good for paging up and down, though it's a bit better for paging up.
With the index in place (and after running
ANALYZE
on the table), you won't see sequential scans any more (unless your table is small). The database is not "going through almost all your rows" any more.Read the fine presentation by Markus Winand you are linking to.
If you want paging across more than one
firstname
, use ROW values. Example for paging down:Related:
If the
SELECT
list only addslastname
like in your example, you might try to add that column to the index to get index-only scans out of it:Index expressions in this order.
The upcoming Postgres 11 allows indexes to
INCLUDE
columns, which results in smaller index size, better performance and is applicable in more situations. Like: