I have 3 tables and a materialized view:
resource_categories contains all category names and metadata
create table if not exists resource_categories (
category_id INT,
title VARCHAR(255),
content TEXT,
icon VARCHAR(50)
);
resources contains all resources under each category
create table if not exists resources (
resource_id INT,
title VARCHAR(255),
content TEXT,
link VARCHAR(1000),
category_id INT REFERENCES resource_categories(resource_id),
icon VARCHAR(255),
created_at DATE,
updated_at DATE
);
resource_votes contains user liking a resource or disliking it
create table if not exists resource_votes (
resource_id INT REFERENCES resources(resource_id),
user_id INT,
vote BOOLEAN
);
resource_votes_aggregate a materialized view with likes per resource_id
CREATE materialized view resource_votes_aggregate AS
SELECT
resource_id,
COUNT(
CASE
WHEN
vote = TRUE
THEN
1
END
) AS likes
FROM
resource_votes
GROUP BY
resource_id;
- I want to execute the following queries
- Find all resources sorted in descending order of likes (find most liked resources)
- Find all resources sorted in descending alphabetical order of their titles
- I want to use SEEK / KEYSET pagination for efficiency
Query to find resources in descending order of their likes page 1
SELECT
r.resource_id,
title,
COALESCE(likes, 0) AS likes
FROM
resources r
LEFT JOIN
resource_votes_aggregate a
ON r.resource_id = a.resource_id
ORDER BY
likes DESC,
resource_id DESC LIMIT 5;
Execution Plan
QUERY PLAN
Limit (cost=74.50..74.52 rows=5 width=157) (actual time=0.058..0.060 rows=5 loops=1)
-> Sort (cost=74.50..76.80 rows=918 width=157) (actual time=0.058..0.058 rows=5 loops=1)
Sort Key: (COALESCE(a.likes, '0'::bigint)) DESC, r.resource_id DESC
Sort Method: top-N heapsort Memory: 25kB
-> Hash Right Join (cost=12.03..59.25 rows=918 width=157) (actual time=0.032..0.046 rows=50 loops=1)
Hash Cond: (a.resource_id = r.resource_id)
-> Seq Scan on resource_votes_aggregate a (cost=0.00..30.40 rows=2040 width=12) (actual time=0.001..0.004 rows=38 loops=1)
-> Hash (cost=10.90..10.90 rows=90 width=149) (actual time=0.021..0.021 rows=50 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on resources r (cost=0.00..10.90 rows=90 width=149) (actual time=0.003..0.010 rows=50 loops=1)
Planning Time: 0.050 ms
Execution Time: 0.075 ms
This and all the other queries generate a sequential scan which beats the entire purpose of seek pagination
- How do I fix this. Help is super appreciated
Best Answer
There may be more potential for improvement. But the obvious big issue is this:
The manual:
So even if there is an applicable index including
resource_votes_aggregate.likes
, it cannot be used as the column is hidden behind the expressionCOALESCE(likes, 0) AS likes
- reflected in the query plan as:Fix with:
This refers to the input column now and qualifies for index usage. It's substantially cheaper to evaluate in any case, even without index, so you can't go wrong with this.
Since
likes
obviously can beNULL
, be sure to addNULLS LAST
. See:Table-qualification for all input columns (like I added) is a good way to avoid confusion - not only for the problem at hand.
MATERIALIZED VIEW
The MV in your fiddle would be more efficient like this:
The aggregate
FILTER
is typically faster.count()
returnsbigint
(8 bytes). Cast tointeger
(4 bytes).This is the all-important index (missing in your fiddle):
With two
integer
columns, index tuples fit the minimum size of 8 bytes. (Withbigint
forlikes
, it would be 16 bytes (12 + 4 padding). See:Obviously, you also need an index on
resources(resources_id)
. If that's the PK, then it's there.If you use this query a lot, and preconditions for index-only scans are given, a multicolumn index on
(resources_id) INCLUDE (title)
may pay. See:Optimized query
If Postgres is not smart enough to give you the best query plan, this equivalent query will:
The second
SELECT
will typically be never executed, if there are at least 5 resources with likes. See:Aside
You mentioned pagination. Be sure to avoid
LIMIT
/OFFSET
for big tables. See: