PostgreSQL Pagination – How to Do Pagination with UUID v4 and Created Time on Concurrent Inserted Data

pagingpostgresql

Context:

Out of curiosity, I'm doing load testing for my application. And then the result there's a lot of concurrent inserts happened.

After doing the load testing on create-endpoint, I'm trying to do load testing on the Fetch endpoint, including testing the pagination. For the pagination, I'm combining two columns, id (PK with UUID v4) and created_time. Also, I've added an index for faster sorting.
I'm following these solutions from here.

Problem:

Since the data was inserted concurrently, there are a few rows that have the same created_time, in my case up to 100(rows) in the same timestamp.

This is my table schema, an example

BEGIN;

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

DROP TABLE IF EXISTS "payment_with_uuid";

CREATE TABLE "payment_with_uuid" (
 id VARCHAR(255) PRIMARY KEY NOT NULL DEFAULT (uuid_generate_v4()),
 amount integer NULL,
 name varchar(255) default NULL,
 created_time TIMESTAMPTZ NOT NULL DEFAULT (now() AT TIME ZONE 'utc')
);

CREATE INDEX idx_payment_pagination ON payment_with_uuid (created_time, id);

COMMIT;

This is my query,

SELECT  * from payment_with_uuid ORDER BY  created_time DESC, id DESC LIMIT 10;

It will return 10 rows of payment, assume the data will look like this, and assume the timestamp is same until the 100th row

+-------------------------------------+--------+------------+---------------------+
| id                                  | amount | name       | created_time        |
+-------------------------------------+--------+------------+---------------------+
| ffffa567-e95a-4c8b-826c-e2be6acaeb6d| 32003  | Allistair  | 2020-05-24 21:27:10 | 
| ffff2dd6-3872-4acc-afec-7a568935f729| 32003  | James      | 2020-05-24 21:27:10 | 
| fffe3477-1710-45c4-b554-b539a9ee8fa7| 32003  | Kane       | 2020-05-24 21:27:10 | 

And for fetching the next page, this is my query looks like,

SELECT * FROM payment_with_uuid 
WHERE 
created_time <= '2020-05-24 21:27:10' :: timestamp
AND 
id <'fffe3477-1710-45c4-b554-b539a9ee8fa7' 
ORDER BY created_time DESC, id DESC LIMIT 10;

And because of that, the pagination messed up, like some records that exist on the 1st page, may exist on 2nd, or 3rd, or any pages. And sometimes the records are missing.

Questions and Notes:

  • Is there any way to do this in a more elegant way?

  • I know using auto-increment will solve this, but choosing auto-increment id is not an option for us, because we're trying to make everything is consistent across microservice, many services already using UUID as the PK.

  • Using offset and limit will also solve this, but it's not a good practice as far as I know as this article explained https://use-the-index-luke.com/no-offset

  • I'm using Postgres 11.4

Best Answer

SELECT * FROM payment_with_uuid 
WHERE 
created_time <= '2020-05-24 21:27:10' :: timestamp
AND 
id <'fffe3477-1710-45c4-b554-b539a9ee8fa7' 
ORDER BY created_time DESC, id DESC LIMIT 10;

This is wrong, but it shouldn't lead to the problem you indicate of the same row showing up on page 1, 2, etc. Rather it would result in most rows failing to show up at all, because the two filters are implemented independently. You need to implement the id filter only within ties of the created_time filter. Elegance I guess is a matter of opinion, but it seems to me that the most elegant solution is the tuple comparator similar to what you had attempted to include in your original question.

SELECT * FROM payment_with_uuid 
WHERE 
(created_time,id) < ('2020-05-24 21:27:10' :: timestamp, 'fffe3477-1710-45c4-b554-b539a9ee8fa7') 
ORDER BY created_time DESC, id DESC LIMIT 10;

Now the timestamp should really be exact, it doesn't look like yours is. How is it getting rounded to the nearest second? In my hands it looks more like 2020-05-25 09:16:29.380925-04

If for some reason you don't want to use the tuple comparator, then you need to include the timestamp twice, once for less than and once for equal to:

WHERE 
created_time < '2020-05-24 21:27:10' :: timestamp
OR  
(
    created_time = '2020-05-24 21:27:10' :: timestamp 
    AND 
    id <'fffe3477-1710-45c4-b554-b539a9ee8fa7' 
)

In addition to not being very elegant, this will probably not use the index very effectively. You could use boolean reasoning to re-write it to avoid that top-level OR, so that it can use the index, but then it will get even harder to read and understand.