Using Postgres 9.4, I'm trying to order a query by an indexed datetime column.
explain
SELECT g.*, array_agg(gp.platform_id) as platform_list,
array_agg(gm.metacritic_id) as metacritic_list,
array_agg(gr.review_id) as review_list,
array_agg(gn.genre_id) as genre_list,
array_agg(gp.id) as release_list
FROM gb_game g
LEFT JOIN gb_release gp ON g.id = gp.game_id
LEFT JOIN gb_game_metacritic gm ON g.id = gm.game_id
LEFT JOIN gb_game_reviews gr ON g.id = gr.game_id
LEFT JOIN gb_game_genre gn ON g.id = gn.game_id
GROUP BY g.id
ORDER BY g.release_date
DESC limit 25 offset 0
Limit (cost=66391.46..66391.52 rows=25 width=294)
-> Sort (cost=66391.46..66511.58 rows=48051 width=294)
Sort Key: g.release_date
-> GroupAggregate (cost=1.56..65035.49 rows=48051 width=294)
Group Key: g.id
-> Merge Left Join (cost=1.56..53579.05 rows=691686 width=294)
Merge Cond: (g.id = gn.game_id)
-> Merge Left Join (cost=1.14..16374.78 rows=63464 width=290)
Merge Cond: (g.id = gp.game_id)
-> Merge Left Join (cost=0.85..10856.60 rows=48051 width=282)
Merge Cond: (g.id = gr.game_id)
-> Merge Left Join (cost=0.58..10676.85 rows=48051 width=278)
Merge Cond: (g.id = gm.game_id)
-> Index Scan using platforms_game_pkey on gb_game g (cost=0.29..8984.87 rows=48051 width=274)
-> Index Scan using gb_game_metacritic_6072f8b3 on gb_game_metacritic gm (cost=0.29..1264.75 rows=24569 width=8)
-> Index Scan using gb_game_reviews_6072f8b3 on gb_game_reviews gr (cost=0.28..51.04 rows=686 width=8)
-> Index Scan using platforms_release_6072f8b3 on gb_release gp (cost=0.29..4604.75 rows=63464 width=12)
-> Materialize (cost=0.42..27979.58 rows=523702 width=8)
-> Index Scan using gb_game_genre_6072f8b3 on gb_game_genre gn (cost=0.42..26670.33 rows=523702 width=8)
The release_date
column doesn't have many repeats other than null.
Everything is fast until the final sort.
Is there a special kind of index I need to use to speed up this query?
Best Answer
Better query
Seems like a perfect occasion for a couple of
LATERAL
joins, which should be much faster here:Indexes
You need an index on each
game_id
column in the child tables.Multicolumn indexes might be even better if (and only if) you can get index-only scans out of them:
The order of columns matters.
And finally, you need an index on
gb_game (release_date)
.There is no data type called
in Postgres. Assuming you meandatetime
timestamp
(some ORMs map this way).Either way, if you really mean to
ORDER BY g.release_date DESC
, a plain index on(release_date DESC)
or just(release_date)
does the jobIf the column can be NULL, I suspect you actually want to sort NULL values last. Adapt your query like I did to
ORDER BY g.release_date DESC NULLS LAST
. A plain index on(release_date)
still works, but is not the optimum. Use a matching sort order in your index:Or (almost equivalent):
More:
Explain query
You join to multiple tables that can have
n
related rows for a single row ingb_game
, resulting rows in the intermediary derived table are multiplied with every join to multiple rows. That's not only a major performance issue, it's also incorrect. Or at least I assume as much while detailed information is missing. Related:I avoid the problem a priori by aggregating each child table separately in
LATERAL
subqueries. That seems particularly efficient, since you only pick 25 rows total.While being at it I use a faster
ARRAY
constructor for single column aggregation (while I still usearray_agg()
for the case where two columns from the same table are aggregated separately.Note that the order of elements in each array is arbitrary. You can add an
ORDER BY
clause in each lateral subquery to determine the order.The last three lateral subqueries could be replaced with a correlated subqueries: