I'm trying to optimize a query on a large database that essentially fetches 50 images for a site and the URL's for the pages they appear on. The current query works okay for smaller sites but larger ones, such as the one I'm testing with that has 50,000 images, the query effectively times out.
I've narrowed down the slowness to the second CTE (y) because it's fetching all matching records in that table that consists of over 150M rows. What I'd like to do is use the first CTE (x) as an input to the second so that only records tied to resources fetched in x are fetched in y, since x is where the filtering occurs and results are limited to 50 per query. Is this possible?
WITH x AS
(
SELECT vi.resourceid
FROM vw_image vi
WHERE vi.siteid = 2294
AND vi.childtype = 'i'
ORDER BY vi.id
OFFSET 1 ROWS
FETCH NEXT 50 ROWS ONLY
),
y AS (
SELECT rop.resourceid as resourceid, r.url as url
FROM resourceonpage rop
INNER JOIN resource r ON rop.pageid = r.id
WHERE rop.siteid = 2294
)
SELECT i.*, string_agg(y.url, ',') as urllist
FROM x
INNER JOIN vw_image i ON x.resourceid = i.resourceid
LEFT JOIN y ON y.resourceid = x.resourceid;
Best Answer
Looking into my crystal ball, your query might work faster by orders of magnitude like this:
You don't need CTEs here, those will only slow you down.
I use a
LATERAL
join to use indexes onresourceonpage
andresource
and aggregate in the subquery right away. AboutLEFT JOIN LATERAL ... ON true
:Looking even deeper into my crystal ball, the best indexes might be:
Appending
url
is only useful if you get index-only scans out of it.An alternative approach would be to use parentheses in the
FROM
list like demonstrated here:But, according to my crystal ball (and an educated guess) the
LATERAL
version will be faster. There is not enough information in the question to give you more than educated guesses.