PostgreSQL Sorting an output of a fulltext search query takes very long time

optimizationpostgresqlpostgresql-performancequery-performance

I have a table with a few millions of rows where I perform a SELECT with a complex fulltext search query in the WHERE clause (over a fulltext search index column) and with ORDER BY id DESC at the end. There are more parts in the query but I think these are the relevant ones.

The final output is ~10k rows.

The query is very slow (~1.5 minutes), however if I drop either the fulltext search or the ORDER BY it resolves in seconds.

Why is that? Is there any way to improve the performance?

EXPLAIN ANALYZE output of the full query is this:

EXPLAIN ANALYZE SELECT 
* FROM "my_company" WHERE ("my_company"."company_type" = 2 AND "my_company"."id" IN (SELECT U0."company_id" FROM "my_companysource" U0 WHERE U0."source_id" IN (1,2,3,4,5,6)) AND NOT ("my_company"."id" IN (SELECT U0."id" FROM "my_company" U0 INNER JOIN "campaigns_processedurl" U1 ON (U0."id" = U1."company_id") INNER JOIN "campaigns_urlcampaign" U2 ON (U1."campaign_id" = U2."id") WHERE U2."project_id" IN (999))) AND ( ("my_company"."fulltext_search_index") @@ (to_tsquery('pg_catalog.english',
'(Trading)|(Bank)|(Banking)|(Finance)|(Financing)|(Transaction)|(Compliance)|(Insurance)|(Currencies)|(Currency)|(Credit <-> Cards)|(Financial <-> Services)|(Invoicing)|(Loans)|(Leasing)|(Cryptocurrency)|(Risk <-> Management)|(Stock <-> Exchanges)|(Asset <-> Management)|(Fintech)|(Incubators)|(Bitcoin)|(Lending)|(Point <-> Of <-> Sale)|(Cyber <-> Security)|(Trading <-> Platform)|(Smart <-> Contracts)|(Peer <-> To <-> Peer)|(Debit <-> Cards)|(Venture <-> Capital)|(Mortgages)|(P2P)|(Wealth <-> Management)|(Crowdfunding)|(Financial <-> Planning)|(Stock <-> Market)|(Life <-> Insurance)|(Mobile <-> Payments)|(Health <-> Insurance)|(Remittance)|(Real <-> Estate <-> Investment)|(Business <-> Financing)|(Investing <-> Services)|(Financial <-> Technology)|(Pos <-> Solution)|(Transaction <-> Processing)|(Private <-> Banking)|(Credit <-> Scoring)|(Hedge <-> Funds)|(Personal <-> Finance)|(Retail <-> Banking)|(Claim <-> Management)|(Fraud <-> Detection)|(Insurtech)|(Virtual <-> Currency)|(Angel <-> Investment)|(Crowd <-> Funding)|(Core <-> Banking)|(Digital <-> Bank)|(Fin <-> Tech)|(Consumer <-> Financing)|(Impact <-> Investing)|(P2P <-> Lending)|(Debt <-> Collections)|(Banking <-> Platform)|(Auto <-> Insurance)|(Property <-> Insurance)|(Commercial <-> Insurance)|(Prediction <-> Markets)|(Creditworthiness)|(Regtech)|(Financial <-> Exchanges)|(Consumer <-> Lending)|(Credit <-> Bureau)|(Banking <-> Api)|(Authentication <-> Technology)|(Micro <-> Lending)|(Microloans)|(Regulatory <-> Tech)|(Peer <-> To <-> Peer <-> Banking)|(Commercial <-> Lending)|(Financial <-> Management <-> Application)|(Microinvesting)|(P2P <-> Insurance)|(Reg <-> Tech)|(Bank <-> Clearance)|(Crowd <-> Financing)|(Crowdfinncing)|(Investment <-> Management <-> System)|(P2P <-> Banking)'
))) AND "my_company"."url_status" IN (1)) ORDER BY "my_company"."id" DESC;

QUERY PLAN
----------

 Sort  (cost=1246558.99..1246591.25 rows=12902 width=1600) (actual time=93420.927..93422.435 rows=10934 loops=1)
   Sort Key: my_company.id DESC
   Sort Method: quicksort  Memory: 21040kB
   ->  Hash Semi Join  (cost=511240.57..1245678.09 rows=12902 width=1600) (actual time=5997.812..93374.174 rows=10934 loops=1)
         Hash Cond: (my_company.id = u0.company_id)
         ->  Bitmap Heap Scan on my_company  (cost=499793.38..1233166.59 rows=197175 width=1600) (actual time=5397.255..92718.751 rows=106119 loops=1)
               Recheck Cond: ((fulltext_search_index @@ '
''trade'' | ''bank'' | ''bank'' | ''financ'' | ''financ'' | ''transact'' | ''complianc'' | ''insur'' | ''currenc'' | ''currenc'' | ''credit'' <-> ''card'' | ''financi'' <-> ''servic'' | ''invoic'' | ''loan'' | ''leas'' | ''cryptocurr'' | ''risk'' <-> ''manag'' | ''stock'' <-> ''exchang'' | ''asset'' <-> ''manag'' | ''fintech'' | ''incub'' | ''bitcoin'' | ''lend'' | ''point'' <2> ''sale'' | ''cyber'' <-> ''secur'' | ''trade'' <-> ''platform'' | ''smart'' <-> ''contract'' | ''peer'' <2> ''peer'' | ''debit'' <-> ''card'' | ''ventur'' <-> ''capit'' | ''mortgag'' | ''p2p'' | ''wealth'' <-> ''manag'' | ''crowdfund'' | ''financi'' <-> ''plan'' | ''stock'' <-> ''market'' | ''life'' <-> ''insur'' | ''mobil'' <-> ''payment'' | ''health'' <-> ''insur'' | ''remitt'' | ''real'' <-> ''estat'' <-> ''invest'' | ''busi'' <-> ''financ'' | ''invest'' <-> ''servic'' | ''financi'' <-> ''technolog'' | ''pos'' <-> ''solut'' | ''transact'' <-> ''process'' | ''privat'' <-> ''bank'' | ''credit'' <-> ''score'' | ''hedg'' <-> ''fund'' | ''person'' <-> ''financ'' | ''retail'' <-> ''bank'' | ''claim'' <-> ''manag'' | ''fraud'' <-> ''detect'' | ''insurtech'' | ''virtual'' <-> ''currenc'' | ''angel'' <-> ''invest'' | ''crowd'' <-> ''fund'' | ''core'' <-> ''bank'' | ''digit'' <-> ''bank'' | ''fin'' <-> ''tech'' | ''consum'' <-> ''financ'' | ''impact'' <-> ''invest'' | ''p2p'' <-> ''lend'' | ''debt'' <-> ''collect'' | ''bank'' <-> ''platform'' | ''auto'' <-> ''insur'' | ''properti'' <-> ''insur'' | ''commerci'' <-> ''insur'' | ''predict'' <-> ''market'' | ''creditworthi'' | ''regtech'' | ''financi'' <-> ''exchang'' | ''consum'' <-> ''lend'' | ''credit'' <-> ''bureau'' | ''bank'' <-> ''api'' | ''authent'' <-> ''technolog'' | ''micro'' <-> ''lend'' | ''microloan'' | ''regulatori'' <-> ''tech'' | ''peer'' <2> ''peer'' <-> ''bank'' | ''commerci'' <-> ''lend'' | ''financi'' <-> ''manag'' <-> ''applic'' | ''microinvest'' | ''p2p'' <-> ''insur'' | ''reg'' <-> ''tech'' | ''bank'' <-> ''clearanc'' | ''crowd'' <-> ''financ'' | ''crowdfinnc'' | ''invest'' <-> ''manag'' <-> ''system'' | ''p2p'' <-> ''bank''
'::tsquery) AND (company_type = 2) AND (url_status = 1))
               Rows Removed by Index Recheck: 54965
               Filter: (NOT (hashed SubPlan 1))
               Rows Removed by Filter: 57142
               Heap Blocks: exact=197083
               ->  BitmapAnd  (cost=94060.43..94060.43 rows=394349 width=0) (actual time=3605.515..3605.515 rows=0 loops=1)
                     ->  Bitmap Index Scan on my_company_main_fulltext  (cost=0.00..6414.92 rows=620827 width=0) (actual time=2419.704..2419.704 rows=352274 loops=1)
                           Index Cond: (fulltext_search_index @@ '
''trade'' | ''bank'' | ''bank'' | ''financ'' | ''financ'' | ''transact'' | ''complianc'' | ''insur'' | ''currenc'' | ''currenc'' | ''credit'' <-> ''card'' | ''financi'' <-> ''servic'' | ''invoic'' | ''loan'' | ''leas'' | ''cryptocurr'' | ''risk'' <-> ''manag'' | ''stock'' <-> ''exchang'' | ''asset'' <-> ''manag'' | ''fintech'' | ''incub'' | ''bitcoin'' | ''lend'' | ''point'' <2> ''sale'' | ''cyber'' <-> ''secur'' | ''trade'' <-> ''platform'' | ''smart'' <-> ''contract'' | ''peer'' <2> ''peer'' | ''debit'' <-> ''card'' | ''ventur'' <-> ''capit'' | ''mortgag'' | ''p2p'' | ''wealth'' <-> ''manag'' | ''crowdfund'' | ''financi'' <-> ''plan'' | ''stock'' <-> ''market'' | ''life'' <-> ''insur'' | ''mobil'' <-> ''payment'' | ''health'' <-> ''insur'' | ''remitt'' | ''real'' <-> ''estat'' <-> ''invest'' | ''busi'' <-> ''financ'' | ''invest'' <-> ''servic'' | ''financi'' <-> ''technolog'' | ''pos'' <-> ''solut'' | ''transact'' <-> ''process'' | ''privat'' <-> ''bank'' | ''credit'' <-> ''score'' | ''hedg'' <-> ''fund'' | ''person'' <-> ''financ'' | ''retail'' <-> ''bank'' | ''claim'' <-> ''manag'' | ''fraud'' <-> ''detect'' | ''insurtech'' | ''virtual'' <-> ''currenc'' | ''angel'' <-> ''invest'' | ''crowd'' <-> ''fund'' | ''core'' <-> ''bank'' | ''digit'' <-> ''bank'' | ''fin'' <-> ''tech'' | ''consum'' <-> ''financ'' | ''impact'' <-> ''invest'' | ''p2p'' <-> ''lend'' | ''debt'' <-> ''collect'' | ''bank'' <-> ''platform'' | ''auto'' <-> ''insur'' | ''properti'' <-> ''insur'' | ''commerci'' <-> ''insur'' | ''predict'' <-> ''market'' | ''creditworthi'' | ''regtech'' | ''financi'' <-> ''exchang'' | ''consum'' <-> ''lend'' | ''credit'' <-> ''bureau'' | ''bank'' <-> ''api'' | ''authent'' <-> ''technolog'' | ''micro'' <-> ''lend'' | ''microloan'' | ''regulatori'' <-> ''tech'' | ''peer'' <2> ''peer'' <-> ''bank'' | ''commerci'' <-> ''lend'' | ''financi'' <-> ''manag'' <-> ''applic'' | ''microinvest'' | ''p2p'' <-> ''insur'' | ''reg'' <-> ''tech'' | ''bank'' <-> ''clearanc'' | ''crowd'' <-> ''financ'' | ''crowdfinnc'' | ''invest'' <-> ''manag'' <-> ''system'' | ''p2p'' <-> ''bank''
'::tsquery)
                     ->  Bitmap Index Scan on my_company_url_status_2704ae13bae74399_uniq  (cost=0.00..87546.67 rows=2432299 width=0) (actual time=1162.482..1162.482 rows=2584111 loops=1)
                           Index Cond: (url_status = 1)
               SubPlan 1
                 ->  Gather  (cost=2143.66..404980.05 rows=301157 width=4) (actual time=36.819..1671.069 rows=265977 loops=1)
                       Workers Planned: 2
                       Workers Launched: 2
                       ->  Nested Loop  (cost=1143.66..373864.35 rows=125482 width=4) (actual time=137.208..1647.572 rows=88659 loops=3)
                             ->  Hash Join  (cost=1143.23..315793.12 rows=125482 width=4) (actual time=137.126..1189.980 rows=88950 loops=3)
                                   Hash Cond: (u1.campaign_id = u2.id)
                                   ->  Parallel Seq Scan on campaigns_processedurl u1  (cost=0.00..302007.65 rows=4815065 width=8) (actual time=0.022..635.291 rows=3857165 loops=3)
                                   ->  Hash  (cost=1136.76..1136.76 rows=518 width=4) (actual time=135.196..135.196 rows=518 loops=3)
                                         Buckets: 1024  Batches: 1  Memory Usage: 27kB
                                         ->  Bitmap Heap Scan on campaigns_urlcampaign u2  (cost=16.30..1136.76 rows=518 width=4) (actual time=133.979..135.121 rows=518 loops=3)
                                               Recheck Cond: (project_id = 999)
                                               Heap Blocks: exact=94
                                               ->  Bitmap Index Scan on campaigns_urlcampaign_b098ad43  (cost=0.00..16.17 rows=518 width=0) (actual time=0.100..0.100 rows=518 loops=3)
                                                     Index Cond: (project_id = 999)
                             ->  Index Only Scan using my_company_pkey on my_company u0_1  (cost=0.43..0.46 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=266849)
                                   Index Cond: (id = u1.company_id)
                                   Heap Fetches: 65356
         ->  Hash  (cost=8315.25..8315.25 rows=250555 width=4) (actual time=597.008..597.009 rows=277360 loops=1)
               Buckets: 524288 (originally 262144)  Batches: 1 (originally 1)  Memory Usage: 13847kB
               ->  Index Only Scan using my_companysource_source_id_a8900f16_uniq on my_companysource u0  (cost=0.44..8315.25 rows=250555 width=4) (actual time=417.555..543.621 rows=277360 loops=1)
                     Index Cond: (source_id = ANY ('{1,2,3,4,5,6}'::integer[]))
                     Heap Fetches: 208358
 Planning Time: 6.181 ms
 JIT:
   Functions: 71
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 6.576 ms, Inlining 153.968 ms, Optimization 404.078 ms, Emission 260.011 ms, Total 824.634 ms
 Execution Time: 93478.951 ms
(45 rows)

Execution plan without ORDER BY

I had to omit the fulltext query since the question would be over the character limit.

EXPLAIN ANALYZE SELECT * FROM "my_company" WHERE ("my_company"."company_type" = 2 AND "my_company"."id" IN (SELECT U0."company_id" FROM "my_companysource" U0 WHERE U0."source_id" IN (1,2,3,4,5,6)) AND NOT ("my_company"."id" IN (SELECT U0."id" FROM "my_company" U0 INNER JOIN "campaigns_processedurl" U1 ON (U0."id" = U1."company_id") INNER JOIN "campaigns_urlcampaign" U2 ON (U1."campaign_id" = U2."id") WHERE U2."project_id" IN (999))) AND ( ("my_company"."fulltext_search_index") @@ (to_tsquery('pg_catalog.english',
*same query as in the execution plan above*))) AND "my_company"."url_status" IN (1));

QUERY PLAN
----------

 Hash Semi Join  (cost=512357.67..1237114.98 rows=12571 width=1599) (actual time=4974.659..9603.716 rows=10881 loops=1)
   Hash Cond: (my_company.id = u0.company_id)
   ->  Bitmap Heap Scan on my_company  (cost=500699.64..1224420.02 rows=192089 width=1599) (actual time=4366.382..8953.154 rows=106143 loops=1)
         Recheck Cond: ((fulltext_search_index @@ '
*same query as in the execution plan above*
'::tsquery) AND (company_type = 2) AND (url_status = 1))
         Rows Removed by Index Recheck: 54962
         Filter: (NOT (hashed SubPlan 1))
         Rows Removed by Filter: 57166
         Heap Blocks: exact=197126
         ->  BitmapAnd  (cost=94756.73..94756.73 rows=384177 width=0) (actual time=2651.329..2651.329 rows=0 loops=1)
               ->  Bitmap Index Scan on my_company_main_fulltext  (cost=0.00..7117.74 rows=604909 width=0) (actual time=2261.803..2261.803 rows=343137 loops=1)
                     Index Cond: (fulltext_search_index @@ '
*same query as in the execution plan above*
'::tsquery)
               ->  Bitmap Index Scan on my_company_url_status_2704ae13bae74399_uniq  (cost=0.00..87542.70 rows=2433369 width=0) (actual time=367.116..367.116 rows=2545163 loops=1)
                     Index Cond: (url_status = 1)
         SubPlan 1
           ->  Gather  (cost=2143.66..405190.13 rows=301113 width=4) (actual time=36.725..1599.373 rows=265970 loops=1)
                 Workers Planned: 2
                 Workers Launched: 2
                 ->  Nested Loop  (cost=1143.66..374078.83 rows=125464 width=4) (actual time=124.494..1575.190 rows=88657 loops=3)
                       ->  Hash Join  (cost=1143.23..315802.59 rows=125464 width=4) (actual time=124.411..1161.406 rows=88950 loops=3)
                             Hash Cond: (u1.campaign_id = u2.id)
                             ->  Parallel Seq Scan on campaigns_processedurl u1  (cost=0.00..302015.15 rows=4815815 width=8) (actual time=0.020..583.296 rows=3869911 loops=3)
                             ->  Hash  (cost=1136.76..1136.76 rows=518 width=4) (actual time=122.546..122.546 rows=518 loops=3)
                                   Buckets: 1024  Batches: 1  Memory Usage: 27kB
                                   ->  Bitmap Heap Scan on campaigns_urlcampaign u2  (cost=16.30..1136.76 rows=518 width=4) (actual time=121.306..122.471 rows=518 loops=3)
                                         Recheck Cond: (project_id = 999)
                                         Heap Blocks: exact=94
                                         ->  Bitmap Index Scan on campaigns_urlcampaign_b098ad43  (cost=0.00..16.17 rows=518 width=0) (actual time=0.101..0.101 rows=518 loops=3)
                                               Index Cond: (project_id = 999)
                       ->  Index Only Scan using my_company_pkey on my_company u0_1  (cost=0.43..0.46 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=266849)
                             Index Cond: (id = u1.company_id)
                             Heap Fetches: 42965
   ->  Hash  (cost=8523.76..8523.76 rows=250742 width=4) (actual time=606.830..606.830 rows=277120 loops=1)
         Buckets: 524288 (originally 262144)  Batches: 1 (originally 1)  Memory Usage: 13839kB
         ->  Index Only Scan using my_companysource_source_id_a8900f16_uniq on my_companysource u0  (cost=0.44..8523.76 rows=250742 width=4) (actual time=418.030..549.896 rows=277120 loops=1)
               Index Cond: (source_id = ANY ('{1,2,3,4,5,6}'::integer[]))
               Heap Fetches: 210515
 Planning Time: 6.179 ms
 JIT:
   Functions: 71
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 6.517 ms, Inlining 135.258 ms, Optimization 384.718 ms, Emission 260.459 ms, Total 786.952 ms
 Execution Time: 9658.889 ms
(42 rows)

Index definition:

CREATE INDEX my_company_main_fulltext 
ON public.my_company USING gin (fulltext_search_index) 
WHERE (company_type = 2)

Best Answer

The time is spent in the "bitmap heap scan".

But the same bitmap heap scan is fast in one execution and slow in the other, so the difference is very likey due to caching after the first execution.

To make sure, use EXPLAIN (ANALYZE, BUFFERS) This will show you the number of blocks hit in the cache vs. blocks read from disk/kernel cache.