Postgresql – keyset pagination on extended query with full text search in postgresql

full-text-searchpostgresql

I have a database with 4 tables. These 4 tables hold users, their relations with each other (users following other users), the posts they make and the likes they receive. I have simplified the tables for better understanding.

// table: users
+-------------------------------------------------+
| user_id | username | timestamp_joined | tokens  |
+-------------------------------------------------+

// table: followers
+-------------------------------------------------+
| follower_id | user_id | following_user_id       |
+-------------------------------------------------+

// table: posts
+-------------------------------------------------+
| post_id | user_id | timestamp_posted            |
+-------------------------------------------------+

// table: post_likes
+-------------------------------------------------+
| post_like_id | post_id | user_id                |
+-------------------------------------------------+

Now, I want to do a full text search combined with a keyset pagination. I've managed to create a query for the full text search, but I fail at modifying that query for keyset pagination.

What I want

I want to search for users, but I don't want to rank them on user_id. I want to rank them on 3 things, listed below, in that order.

1) First, I want to rank them on ts_rank(tokens, plainto_tsquery('search query')). The better there is a match between tokens (a column containing a to_tsvector object) and the search query, the higher the user should appear in the search results.

2) Secondly, I want to rank them on popularity. When 2 users have the same level of match on a search query, I want the most popular user to appear first. I have created a small equation to calculate the popularity. I know it's not a good equation, but for now (and for testing), it works. I will think about a better equation when it all works.

// equation parameters
d = # days user exists
f = # followers for a user
p = # posts made by a user in the last 5 days
l = total # likes on the posts made by the user in the last 5 days

// equation
(0.25 * d * (0.25 * f)) + (0.001 * d) + SQRT((f/d)) + ((0.1 * p) * (l/p))

3) When users still have the same level of match on a search query AND the same popularity, I want the oldest users to appear first (ORDER BY user_id ASC).

First attempt of creating a query (no keyset pagination yet)

The query to do all this is very extended, and I'm not sure about the performance when the number of rows increase.

SELECT
    user_id,
    username,
    ts_rank(tokens, plainto_tsquery('search query')) AS search_rank,
    (
        (0.25 * (SELECT EXTRACT('day' FROM date_trunc('day', NOW() - timestamp_joined::date))) * (0.25 * ((SELECT COUNT(follower_id) FROM followers WHERE following_user_id = user_id)))
        +
        (0.001 * (SELECT EXTRACT('day' FROM date_trunc('day', NOW() - timestamp_joined::date))))
        +
        (SQRT(((SELECT COUNT(follower_id) FROM followers WHERE following_user_id = user_id)/(SELECT EXTRACT('day' FROM date_trunc('day', NOW() - timestamp_joined::date)))))
        +
        ((0.1 * (SELECT COUNT(post_id) FROM posts WHERE user_id = user_id AND timestamp_posted > NOW() - INTERVAL '5 DAY')) * ((SELECT COUNT(post_like_id) FROM post_likes WHERE post_id IN (SELECT post_id FROM posts WHERE user_id = user_id))/(SELECT COUNT(post_id) FROM posts WHERE user_id = user_id AND timestamp_posted > NOW() - INTERVAL '5 DAY')))
    ) AS popularity_rank
FROM
    users
ORDER BY
    search_rank DESC,
    popularity_rank DESC,
    user_id ASC

As you can see, this is a complicated query (especially with the equation), with 9 SELECT statements in it. But the query works.

My questions

1) First of all, I want to know something about the performance of this query. I'm a beginner, and I really don't know how the PostgreSQL works in the back. 9 SELECT-statements in 1 query, to me, it seems a lot. Will the performance drop when the database hits, let's say, +1,000,000 records in each table? Will adding indexes to the tables help?

2) Secondly, I want to know how I can add a keyset pagination to this query.

I know it's a lot, but I wanted to make one question to get a better overview.

Thanks in advance!

Best Answer

First, some subqueries are unnnecessary. Instead of

(SELECT EXTRACT('day' FROM date_trunc('day', NOW() - timestamp_joined::date)))

you might as well write

EXTRACT('day' FROM date_trunc('day', NOW() - timestamp_joined::date)

Second, rather than having subselects in the SELECT list, you should OUTER JOIN the table users with the other tables.

Finally, if you want the query to be fast, any you want to use keyset pagination, you'll have to create an index that supports the ORDER BY, that is and index that covers the complete ORDER BY clause. But since your ORDER BY clause depends on parameters (searchquery) and the content of other tables, that is not possible.

So efficient keyset pagination is not possible unless you use a different query. An alternative would be to use a WITH HOLD cursor that materializes the whole query result on the server and fetch it page by page.