Postgresql – Slow query when grouping and ordering

optimizationpostgresqlpostgresql-performancequery-performance

I have a query that is finding the latest documents for a user that are associated with another user having a high contact score.

The documents table has 1.2M records and the document_service_contacts has 11M records.

It seems that no matter how I rework the query (for example, using a window function instead of group by or a subquery instead of joins & group by) it still has a sequential scan on document_service_contacts in the query plan. I don't understand why that is since there is an index on "document_service_contacts"."document_id".

explain (analyze, buffers)
 SELECT "documents".*
 FROM "documents"
  LEFT JOIN "document_service_contacts" ON "document_service_contacts"."document_id" = documents.id
  LEFT JOIN "service_contacts" ON "service_contacts"."id" = "document_service_contacts"."shared_service_contact_id"
  LEFT JOIN "contacts" ON "contacts"."id" = "service_contacts"."contact_id"

  WHERE "documents"."user_id" = 18
    AND (documents.snoozed_at IS NULL
         OR documents.snoozed_at < '2020-06-09 11:07:32.392427')
    AND (documents.dismissed_at IS NULL)
    AND (documents.rejected_at IS NULL)
    AND (documents.deleted_at IS NULL)
    AND (documents.last_seen_at IS NULL)
    AND (documents.is_shared='f' OR contacts.loopscore > 1)
    group by documents.id
   order by documents.last_modified_at desc limit 50;

                                                                                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=393521.80..393521.83 rows=50 width=828) (actual time=25601.359..25601.377 rows=50 loops=1)
   Buffers: shared hit=190684 read=24850 dirtied=130
   I/O Timings: read=16209.429
   ->  Sort  (cost=393521.80..393570.51 rows=97418 width=828) (actual time=25601.358..25601.367 rows=50 loops=1)
         Sort Key: documents.last_modified_at DESC
         Sort Method: top-N heapsort  Memory: 65kB
         Buffers: shared hit=190684 read=24850 dirtied=130
         I/O Timings: read=16209.429
         ->  HashAggregate  (cost=392582.32..392874.57 rows=97418 width=828) (actual time=25440.537..25529.152 rows=85215 loops=1)
               Group Key: documents.id
               Buffers: shared hit=190684 read=24850 dirtied=130
               I/O Timings: read=16209.429
               ->  Hash Left Join  (cost=241333.32..392276.08 rows=612482 width=828) (actual time=17904.675..24945.977 rows=569118 loops=1)
                     Hash Cond: (service_contacts.contact_id = contacts.id)
                     Filter: ((NOT documents.is_shared) OR (contacts.loopscore > 1))
                     Rows Removed by Filter: 59236
                     Buffers: shared hit=190684 read=24850 dirtied=130
                     I/O Timings: read=16209.429
                     ->  Hash Left Join  (cost=174602.54..321877.70 rows=862964 width=832) (actual time=17305.645..23849.246 rows=628354 loops=1)
                           Hash Cond: (document_service_contacts.shared_service_contact_id = service_contacts.id)
                           Buffers: shared hit=128109 read=21939 dirtied=13
                           I/O Timings: read=16005.282
                           ->  Hash Right Join  (cost=153482.18..297713.50 rows=862964 width=832) (actual time=17065.730..23210.477 rows=628354 loops=1)
                                 Hash Cond: (document_service_contacts.document_id = documents.id)
                                 Buffers: shared hit=108303 read=21939 dirtied=2
                                 I/O Timings: read=16005.282
                                 ->  Seq Scan on document_service_contacts  (cost=0.00..133441.74 rows=10934246 width=8) (actual time=0.009..2434.490 rows=10959356 loops=1)
                                       Buffers: shared hit=100640
                                 ->  Hash  (cost=153141.22..153141.22 rows=97418 width=828) (actual time=17065.441..17065.441 rows=98431 loops=1)
                                       Buckets: 131072  Batches: 1  Memory Usage: 52563kB
                                       Buffers: shared hit=7663 read=21939 dirtied=2
                                       I/O Timings: read=16005.282
                                       ->  Bitmap Heap Scan on documents  (cost=4873.10..153141.22 rows=97418 width=828) (actual time=554.876..16906.607 rows=98431 loops=1)
                                             Recheck Cond: (user_id = 18)
                                             Filter: ((dismissed_at IS NULL) AND (rejected_at IS NULL) AND (deleted_at IS NULL) AND (last_seen_at IS NULL) AND ((snoozed_at IS NULL) OR (snoozed_at < '2020-06-09 11:07:32.392427'::timestamp without time zone)))
                                             Rows Removed by Filter: 69
                                             Heap Blocks: exact=28943
                                             Buffers: shared hit=7663 read=21939 dirtied=2
                                             I/O Timings: read=16005.282
                                             ->  Bitmap Index Scan on index_documents_on_user_id  (cost=0.00..4868.23 rows=97431 width=0) (actual time=542.115..542.115 rows=98507 loops=1)
                                                   Index Cond: (user_id = 18)
                                                   Buffers: shared hit=1 read=658
                                                   I/O Timings: read=502.284
                           ->  Hash  (cost=20412.62..20412.62 rows=202208 width=8) (actual time=238.077..238.077 rows=202572 loops=1)
                                 Buckets: 262144  Batches: 1  Memory Usage: 9961kB
                                 Buffers: shared hit=19806 dirtied=11
                                 ->  Seq Scan on service_contacts  (cost=0.00..20412.62 rows=202208 width=8) (actual time=0.012..135.222 rows=202572 loops=1)
                                       Buffers: shared hit=19806 dirtied=11
                     ->  Hash  (cost=66060.51..66060.51 rows=191505 width=8) (actual time=597.524..597.524 rows=187801 loops=1)
                           Buckets: 262144  Batches: 1  Memory Usage: 9384kB
                           Buffers: shared hit=62575 read=2911 dirtied=117
                           I/O Timings: read=204.147
                           ->  Seq Scan on contacts  (cost=0.00..66060.51 rows=191505 width=8) (actual time=0.011..504.039 rows=187801 loops=1)
                                 Buffers: shared hit=62575 read=2911 dirtied=117
                                 I/O Timings: read=204.147
 Planning time: 1.174 ms
 Execution time: 25602.126 ms
(57 rows)

Indexes:

 Table "public.documents"

Indexes:
    "documents_pkey" PRIMARY KEY, btree (id)
    "index_documents_on_connected_service_id_and_external_id" UNIQUE, btree (connected_service_id, external_id)
    "index_documents_on_document_folder_id" btree (document_folder_id)
    "index_documents_on_external_id" btree (external_id)
    "index_documents_on_last_modified_at" btree (last_modified_at)
    "index_documents_on_service_contact_array" gin (service_contact_array)
    "index_documents_on_user_id" btree (user_id)
    "index_documents_on_user_id_and_visibility" btree (user_id, deleted_at, snoozed_at, dismissed_at)

Table "public.document_service_contacts"

Indexes:
    "document_service_contacts_pkey" PRIMARY KEY, btree (id)
    "index_document_service_contacts_on_connected_service_id" btree (connected_service_id)
    "index_document_service_contacts_on_document_id" btree (document_id)
    "index_document_service_contacts_on_shared_service_contact_id" btree (shared_service_contact_id)

Table "public.service_contacts"

Indexes:
    "service_contacts_pkey" PRIMARY KEY, btree (id)
    "service_contacts_external_id_unique" UNIQUE, btree (external_id, connected_service_id)
    "index_service_contacts_on_connected_service_id" btree (connected_service_id)
    "index_service_contacts_on_contact_id" btree (contact_id)
    "index_service_contacts_on_email_address" btree (email_address)
    "index_service_contacts_on_provider_id" btree (provider_id)
    "index_service_contacts_on_user_id" btree (user_id)

Table "public.contacts"

Indexes:
    "contacts_pkey" PRIMARY KEY, btree (id)
    "contacts_index_for_find_or_create_service_contact" btree (user_id, email_address, last_name, first_name)
    "index_contacts_on_contacted_at" btree (contacted_at)
    "index_contacts_on_email_address_and_user_id" btree (email_address, user_id)
    "index_contacts_on_user_id" btree (user_id)
    "index_contacts_on_user_id_and_email_address" btree (user_id, email_address)

Best Answer

Hitting an index tens of thousands of times is not free. At some point it is faster to read the entire table, and clearly that planner thinks it has reached that point.

Note that the time taken to seq scan document_service_contacts is less than 10% of the total execution time (thought much of this low time is probably that it is already in the cache). Even if that were reduced to zero, it still wouldn't really move the needle.

Your slowest step is reading 21939 pages (out of how many total?) of the documents table in order to obtain about 8% of its rows. It might actually be faster to stop using that index and read the whole table instead.