So I have a large query which I have been trying to optimize as over time it has gotten fairly slow (~116k records taking a few seconds to get a result). The context is a social network feed where the user is "authorized" to see the posts. Each post has an associated tag (which can be a whole location, a classroom or a student. The slow part in question looks like this:
WHERE (
"post"."deletedAt" IS NULL AND (
EXISTS (
SELECT 1
FROM "tags"
WHERE "tags"."postId" = "post"."id" AND (
"tags"."classroomId" IN (classroom ids) OR
"tags"."locationId" IN (location ids) OR
"tags"."studentId" IN (student ids)
)
)
)
)
When I run EXPLAIN ANALYZE
on the query I get this…
-> HashAggregate (cost=4636.49..4637.35 rows=86 width=4) (actual time=23.245..23.269 rows=47 loops=1)"
Group Key: tags_1."postId"
-> Seq Scan on tags tags_1 (cost=0.00..4636.22 rows=107 width=4) (actual time=1.147..23.214 rows=47 loops=1)
Filter: (("classroomId" = 11) OR ("locationId" = 27) OR ("studentId" = 29))
Rows Removed by Filter: 136955
It would be ideal to be able to somehow index these fields so that it would be faster to get all posts for specific users and places. Does the community here have any recommendations for how I would be able to speed this section of query up?
I managed, so far, to make the query about 50% faster by indexing on the comments and likes but this is still a large bottleneck for me. I have tried a few indices to help this but it hasn't really helped. Maybe indices aren't the problem, is there a more efficient way to do multiple IN queries that I'm not aware of?
I appreciate any feedback anyone can give! This is Postgres 9.5.2 by the way.
Best Answer
Possible optimisations
There are a few tricks you can employ:
tags
to look for specific classrooms, students or locations.VACUUM ANALYZE
) so thtat PostgreSQL tries to use index only scans.UNION
instead.Practical approach
This is how I have tried to mimick your scenario on DBFiddle (I have slightly changed your column naming and followed a more "PostgreSQL-standardish" approach):
First, I create a simplified version of your tables, and fill it with an amount of data similar to the one you state
At this point, I execute a query taht tries to mimick the provided
WHERE
condition, and have PostgreSQL explain it:To use the tricks explained before:
and make statistics up-to-date for PostgreSQL:
And now, we
-- An alternate version of the query, that focus first on tags (using index-only scans) -- It avoids ORing conditions (which tend to be difficult to optimize) -- and uses, instead, UNIONs
The end result is a 3x speed increase when doing
SELECT
. Obviously, this will need updating depending a lot on your specific case (I've made a very simplified version to show the tricks). Obviously, this comes at the cost of maintaining a certain number of indexes, and increasingINSERT
andUPDATE
times. You hae to analyze, for your specific case, if this is an overall improvement or an overall worsening..dbfiddle here