PostgreSQL – Optimizing Inner Group By Query Sequential Scan

index-tuningperformancepostgresqlpostgresql-9.5query-performance

I have inner-query in left-join that is using sequential scan instead of index scan, and I have no idea on how to improve it.

I have two tables:

  • pages – columns: id
  • shares – each page has many shares, columns: page_id, view_counter

shares have an index on page_id and shares is a large table.

The query is:

SELECT id, 
       shared.views     views_count, 
       is_shared, 
       shared.views > 0 AS tracked 
FROM   pages 
       LEFT JOIN (SELECT shares.page_id, 
                         Sum(shares.view_counter) 
                  FROM   shares 
                  GROUP  BY page_id) AS shared 
              ON shared.page_id = pages.id
WHERE page_id IN (1, 2, 3) 

The query plan for this query is:

Merge Right Join  (cost=813559.28..895407.83 rows=4 width=21)
  Merge Cond: (shares.page_id = pages.id)
  ->  GroupAggregate  (cost=813558.85..867374.64 rows=2239970 width=8)
        Group Key: shares.page_id
        ->  Sort  (cost=813558.85..824030.88 rows=4188812 width=8)
              Sort Key: shares.page_id
              ->  Seq Scan on shares  (cost=0.00..238288.12 rows=4188812 width=8)
  ->  Index Scan Backward using index_pages_on_id_desc on pages  (cost=0.43..33.53 rows=4 width=13)
        Index Cond: (id = ANY ('{1,2,3}'::integer[]))

How can I re-write the query so I won't have the sequential scan on shares?

Best Answer

I think this is a good candidate for using LATERAL join:

SELECT id, 
       shared.views     AS views_count, 
       is_shared, 
       shared.views > 0 AS tracked 
FROM   pages 
       LEFT JOIN LATERAL
                 (SELECT sum(shares.view_counter) AS views
                  FROM   shares 
                  WHERE  shares.page_id = pages.id
                 ) AS shared 
              ON TRUE
WHERE pages.id IN (1, 2, 3) ;