Postgresql – This query is very slow , how can we make it fast

performancepostgresqlquery-performance

WITH queued_logins AS
(
  SELECT a.id AS account_id,
         apps.id AS app_id,
         apps.name AS app_name,
         COUNT(l.id) AS queued_count
  FROM logins l
    LEFT JOIN throttled_requests tr ON tr.login_id = l.id
    LEFT JOIN queued_messages qm
           ON qm.message ilike E'%login_id%22%0C' || l.id || '%'
          AND qm.message ilike E'%command%22%0Eprovision%'
    INNER JOIN users u ON u.id = l.user_id
    INNER JOIN accounts a ON a.id = l.account_id
    INNER JOIN apps ON apps.id = l.app_id
  WHERE l.provisioning_state IN ('provisioning', 'modifying', 'deleting')
    AND (tr.id IS NOT NULL OR qm.id IS NOT NULL)
  GROUP BY 1,
           2
),
stuck_logins AS
(
  SELECT a.id AS account_id,
         apps.id AS app_id,
         apps.name AS app_name,
         COUNT(l.id) AS stuck_count
  FROM logins l
    LEFT JOIN throttled_requests tr ON tr.login_id = l.id
    LEFT JOIN queued_messages qm
           ON qm.message ilike E'%login_id%22%0C' || l.id || '%'
          AND qm.message ilike E'%command%22%0Eprovision%'
    INNER JOIN users u ON u.id = l.user_id
    INNER JOIN accounts a ON a.id = l.account_id
    INNER JOIN apps ON apps.id = l.app_id
  WHERE l.provisioning_state IN ('provisioning', 'modifying', 'deleting')
    AND tr.id IS NULL
    AND qm.id IS NULL
  GROUP BY 1,
           2
)
SELECT a.name AS account_name,
       COALESCE(q.app_id,s.app_id) AS app_id,
       COALESCE(q.app_name,s.app_name) AS app_name,
       COALESCE(q.queued_count,0) AS queued_count,
       COALESCE(s.stuck_count,0) AS stuck_count
FROM accounts a
  LEFT JOIN queued_logins q ON q.account_id = a.id
  LEFT JOIN stuck_logins s ON s.account_id = a.id
WHERE q.queued_count IS NOT NULL
   OR s.stuck_count IS NOT NULL
ORDER BY 1, 3

Explain PLAN

Sort  (cost=1416168.51..1416444.41 rows=110360 width=549) (actual time=53552.665..53552.722 rows=426 loops=1)
   Sort Key: a.name, (COALESCE(q.app_name, s.app_name))
   Sort Method: quicksort  Memory: 69kB
   CTE queued_logins
     ->  GroupAggregate  (cost=852887.19..854681.85 rows=89733 width=29) (actual time=51290.425..51290.425 rows=0 loops=1)
           Group Key: a_1.id, apps.id
           ->  Sort  (cost=852887.19..853111.53 rows=89733 width=25) (actual time=51290.423..51290.423 rows=0 loops=1)
                 Sort Key: a_1.id, apps.id
                 Sort Method: quicksort  Memory: 25kB
                 ->  Nested Loop Left Join  (cost=312661.81..845505.15 rows=89733 width=25) (actual time=51290.407..51290.407 rows=0 loops=1)
                       Join Filter: (qm.message ~~* (('%login_id%22%0C'::text || (l.id)::text) || '%'::text))
                       Filter: ((tr.id IS NOT NULL) OR (qm.id IS NOT NULL))
                       Rows Removed by Filter: 101466
                       ->  Gather  (cost=312661.81..843260.82 rows=89733 width=29) (actual time=4061.263..51222.045 rows=101466 loops=1)
                             Workers Planned: 2
                             Workers Launched: 2
                             ->  Nested Loop  (cost=311661.81..833287.52 rows=37389 width=29) (actual time=4049.833..51178.207 rows=33822 loops=3)
                                   ->  Hash Left Join  (cost=311661.37..570662.36 rows=37389 width=33) (actual time=4046.690..12939.959 rows=33823 loops=3)
                                         Hash Cond: (l.id = tr.login_id)
                                         ->  Hash Join  (cost=311632.02..570492.79 rows=37389 width=29) (actual time=4046.648..12915.500 rows=33823 loops=3)
                                               Hash Cond: (l.app_id = apps.id)
                                               ->  Hash Join  (cost=289878.94..548266.43 rows=37488 width=16) (actual time=3075.951..11916.082 rows=33823 loops=3)
                                                     Hash Cond: (l.account_id = a_1.id)
                                                     ->  Parallel Bitmap Heap Scan on logins l  (cost=267457.77..525370.72 rows=37587 width=16) (actual time=2886.931..11681.832 rows=33823 loops=3)
                                                           Recheck Cond: (provisioning_state = ANY ('{provisioning,modifying,deleting}'::text[]))
                                                           Heap Blocks: exact=17337
                                                           ->  Bitmap Index Scan on index_logins_on_account_id_and_provisioning_state  (cost=0.00..267435.22 rows=90208 width=0) (actual time=2882.926..2882.926 rows=101468 loops=1)
                                                                 Index Cond: (provisioning_state = ANY ('{provisioning,modifying,deleting}'::text[]))
                                                     ->  Hash  (cost=21041.63..21041.63 rows=110363 width=4) (actual time=188.378..188.378 rows=110363 loops=3)
                                                           Buckets: 131072  Batches: 1  Memory Usage: 4904kB
                                                           ->  Seq Scan on accounts a_1  (cost=0.00..21041.63 rows=110363 width=4) (actual time=0.023..123.462 rows=110363 loops=3)
                                               ->  Hash  (cost=15783.59..15783.59 rows=477559 width=17) (actual time=967.180..967.180 rows=477559 loops=3)
                                                     Buckets: 524288  Batches: 1  Memory Usage: 27141kB
                                                     ->  Seq Scan on apps  (cost=0.00..15783.59 rows=477559 width=17) (actual time=0.319..710.916 rows=477559 loops=3)
                                         ->  Hash  (cost=18.60..18.60 rows=860 width=8) (actual time=0.013..0.013 rows=0 loops=3)
                                               Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                               ->  Seq Scan on throttled_requests tr  (cost=0.00..18.60 rows=860 width=8) (actual time=0.013..0.013 rows=0 loops=3)
                                   ->  Index Only Scan using users_pkey on users u  (cost=0.44..7.02 rows=1 width=4) (actual time=1.129..1.129 rows=1 loops=101468)
                                         Index Cond: (id = l.user_id)
                                         Heap Fetches: 34022
                       ->  Materialize  (cost=0.00..1.00 rows=1 width=36) (actual time=0.000..0.000 rows=0 loops=101466)
                             ->  Seq Scan on queued_messages qm  (cost=0.00..1.00 rows=1 width=36) (actual time=0.011..0.011 rows=0 loops=1)
                                   Filter: (message ~~* '%command%22%0Eprovision%'::text)
   CTE stuck_logins
     ->  GroupAggregate  (cost=526480.36..526480.38 rows=1 width=29) (actual time=1943.682..1983.118 rows=426 loops=1)
           Group Key: a_2.id, apps_1.id
           ->  Sort  (cost=526480.36..526480.36 rows=1 width=25) (actual time=1943.619..1959.165 rows=101466 loops=1)
                 Sort Key: a_2.id, apps_1.id
                 Sort Method: quicksort  Memory: 10415kB
                 ->  Nested Loop  (cost=267488.40..526480.35 rows=1 width=25) (actual time=372.826..1901.909 rows=101466 loops=1)
                       ->  Nested Loop  (cost=267487.98..526475.17 rows=1 width=12) (actual time=371.479..1444.521 rows=101466 loops=1)
                             ->  Nested Loop  (cost=267487.56..526469.95 rows=1 width=12) (actual time=370.870..1205.355 rows=101466 loops=1)
                                   ->  Nested Loop Left Join  (cost=267487.12..526462.93 rows=1 width=16) (actual time=370.840..720.037 rows=101468 loops=1)
                                         Join Filter: (qm_1.message ~~* (('%login_id%22%0C'::text || (l_1.id)::text) || '%'::text))
                                         Filter: (qm_1.id IS NULL)
                                         ->  Hash Left Join  (cost=267487.12..526461.91 rows=1 width=16) (actual time=370.832..552.520 rows=101468 loops=1)
                                               Hash Cond: (l_1.id = tr_1.login_id)
                                               Filter: (tr_1.id IS NULL)
                                               ->  Bitmap Heap Scan on logins l_1  (cost=267457.77..526094.27 rows=90208 width=16) (actual time=370.798..506.565 rows=101468 loops=1)
                                                     Recheck Cond: (provisioning_state = ANY ('{provisioning,modifying,deleting}'::text[]))
                                                     Heap Blocks: exact=51506
                                                     ->  Bitmap Index Scan on index_logins_on_account_id_and_provisioning_state  (cost=0.00..267435.22 rows=90208 width=0) (actual time=361.154..361.154 rows=101468 loops=1)
                                                           Index Cond: (provisioning_state = ANY ('{provisioning,modifying,deleting}'::text[]))
                                               ->  Hash  (cost=18.60..18.60 rows=860 width=8) (actual time=0.007..0.007 rows=0 loops=1)
                                                     Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                                     ->  Seq Scan on throttled_requests tr_1  (cost=0.00..18.60 rows=860 width=8) (actual time=0.007..0.007 rows=0 loops=1)
                                         ->  Seq Scan on queued_messages qm_1  (cost=0.00..1.00 rows=1 width=36) (actual time=0.001..0.001 rows=0 loops=101468)
                                               Filter: (message ~~* '%command%22%0Eprovision%'::text)
                                   ->  Index Only Scan using users_pkey on users u_1  (cost=0.44..7.02 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=101468)
                                         Index Cond: (id = l_1.user_id)
                                         Heap Fetches: 101466
                             ->  Index Only Scan using accounts_pkey on accounts a_2  (cost=0.42..5.21 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=101466)
                                   Index Cond: (id = l_1.account_id)
                                   Heap Fetches: 101466
                       ->  Index Scan using services_pkey on apps apps_1  (cost=0.42..5.18 rows=1 width=17) (actual time=0.004..0.004 rows=1 loops=101466)
                             Index Cond: (id = l_1.app_id)
   ->  Hash Left Join  (cost=22421.20..25762.60 rows=110360 width=549) (actual time=53494.239..53548.903 rows=426 loops=1)
         Hash Cond: (a.id = s.account_id)
         Filter: ((q.queued_count IS NOT NULL) OR (s.stuck_count IS NOT NULL))
         Rows Removed by Filter: 110108
         ->  Hash Right Join  (cost=22421.17..25348.70 rows=110363 width=545) (actual time=51510.610..51543.604 rows=110363 loops=1)
               Hash Cond: (q.account_id = a.id)
               ->  CTE Scan on queued_logins q  (cost=0.00..1794.66 rows=89733 width=532) (actual time=51290.428..51290.428 rows=0 loops=1)
               ->  Hash  (cost=21041.63..21041.63 rows=110363 width=17) (actual time=219.703..219.703 rows=110363 loops=1)
                     Buckets: 131072  Batches: 1  Memory Usage: 6520kB
                     ->  Seq Scan on accounts a  (cost=0.00..21041.63 rows=110363 width=17) (actual time=0.011..184.185 rows=110363 loops=1)
         ->  Hash  (cost=0.02..0.02 rows=1 width=532) (actual time=1983.547..1983.547 rows=426 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 35kB
               ->  CTE Scan on stuck_logins s  (cost=0.00..0.02 rows=1 width=532) (actual time=1943.686..1983.377 rows=426 loops=1)
 Planning time: 71.125 ms
 Execution time: 53557.095 ms

Best Answer

Replace queued_logins and stuck_logins by one query, since they are almost equal. Replace the count by two counts:

sum(case when tr.id is null and qm.id is null then 1 else 0 end)
    stuck_count,
sum(case when tr.id is not null or qm.id is not null then 1 else 0 end)
    queued_count

This should make your query at least twice as fast, if not much more.

The result should be somewhat like

WITH
  queued_logins AS (
      SELECT a.id AS account_id, apps.id AS app_id, apps.name AS app_name,
        sum(case when tr.id is null and qm.id is null then 1 else 0 end)
               stuck_count,
        sum(case when tr.id is not null or qm.id is not null then 1 else 0 end)
               queued_count
        FROM logins l
        LEFT JOIN throttled_requests tr ON tr.login_id = l.id
        LEFT JOIN queued_messages qm
          ON qm.message ILIKE E'%login_id%22%0C' || l.id || '%'
          AND qm.message ILIKE E'%command%22%0Eprovision%'
        INNER JOIN users u ON u.id = l.user_id
        INNER JOIN accounts a ON a.id = l.account_id
        INNER JOIN apps ON apps.id = l.app_id
        WHERE l.provisioning_state IN ('provisioning', 'modifying', 'deleting')
        GROUP BY 1, 2)
 SELECT a.name AS account_name,
        q.app_id AS app_id,
        q.app_name AS app_name,
        q.queued_count AS queued_count,
        q.stuck_count AS stuck_count
   FROM accounts a
   LEFT JOIN queued_logins q ON q.account_id = a.id
   WHERE q.queued_count IS NOT NULL
      OR q.stuck_count IS NOT NULL
   ORDER BY 1, 3;