Postgresql – Creating materialized view is very slow

materialized-viewpostgresqlpostgresql-10

I have quite a complex query in PostgreSQL 10.4 on Amazon RDS (3 joins, quite a few conditions). It takes from 1.5 to 6 seconds to perform a select. However, when I want to create a materialized view from it to speed it up, it takes forever. By forever I mean more that 30 minutes (usually after this client drops connection or something).

Now, I understand that SELECTs make extensive use of parallelization to speed things up and when there is a write (as in creating a materialized view) it cannot use it, but should the results be that different (couple of seconds vs >30 minutes)?

Here's the query in question:

SELECT 
  [...]
FROM 
  "profiles" 
  LEFT OUTER JOIN (
    SELECT DISTINCT "member_id" AS "cpn_member_id" 
    FROM  "coupon_collections" 
    WHERE  (("seen" >= '2017-11-02 14:15:43.111597+0000') AND ("seen" <= '2018-11-02 14:15:43.111652+0000')) 
    GROUP BY "member_id"
  ) AS "ex1" ON ("ex1"."cpn_member_id" = "profiles"."member_id") 
  LEFT OUTER JOIN "mobile_data" ON (
    "mobile_data"."member_id" = "profiles"."member_id"
  ) 
  LEFT OUTER JOIN "consents" AS "consents_sms_marketing" 
    ON (("consents_sms_marketing"."member_id" = "profiles"."member_id") 
      AND ("consents_sms_marketing"."name" = 'sms_marketing')) 
WHERE 
  (("community_id" = 123) AND (("push_enabled" IS FALSE) OR ("push_token" IS NULL)) 
    AND ("profiles"."optin_date" :: date >= '2012-01-01T00:00:00+01:00' :: date) 
    AND ("profiles"."optin_date" :: date <= '2018-09-30T00:00:00+02:00' :: date) 
    AND ("ex1"."cpn_member_id" IS NULL)
    AND ("profiles"."msisdn" IS NOT NULL) 
    AND ("data" ->> 'user_status') = 'verified') 
    AND ("consents_sms_marketing"."value" IS TRUE) 
  )

Explain for SELECT:

QUERY PLAN                                                                                                                                                                                              |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Nested Loop  (cost=295510.43..418688.80 rows=1 width=574)                                                                                                                                               |
  ->  Nested Loop Left Join  (cost=295510.00..418680.35 rows=1 width=574)                                                                                                                               |
        Filter: ((mobile_data.push_enabled IS FALSE) OR (mobile_data.push_token IS NULL))                                                                                                               |
        ->  Merge Anti Join  (cost=295509.57..418671.90 rows=1 width=390)                                                                                                                               |
              Merge Cond: (profiles.member_id = coupon_collections.member_id)                                                                                                                           |
              ->  Sort  (cost=108286.51..108286.51 rows=1 width=390)                                                                                                                                    |
                    Sort Key: profiles.member_id                                                                                                                                                        |
                    ->  Bitmap Heap Scan on profiles  (cost=2119.99..108286.50 rows=1 width=390)                                                                                                        |
                          Recheck Cond: ((community_id = 123) AND (msisdn IS NOT NULL))                                                                                                                |
                          Filter: (((optin_date)::date >= '2012-01-01'::date) AND ((optin_date)::date <= '2018-09-30'::date) AND ((data ->> 'user_status'::text) = 'verified'::text))         |
                          ->  Bitmap Index Scan on profiles_community_id_msisdn_index  (cost=0.00..2119.99 rows=52756 width=0)                                                                          |
                                Index Cond: ((community_id = 123) AND (msisdn IS NOT NULL))                                                                                                            |
              ->  Unique  (cost=187223.06..304500.59 rows=470783 width=4)                                                                                                                               |
                    ->  Group  (cost=187223.06..303323.63 rows=470783 width=4)                                                                                                                          |
                          Group Key: coupon_collections.member_id                                                                                                                                       |
                          ->  Gather Merge  (cost=187223.06..300969.72 rows=941566 width=4)                                                                                                             |
                                Workers Planned: 2                                                                                                                                                      |
                                ->  Group  (cost=186223.04..191289.61 rows=470783 width=4)                                                                                                              |
                                      Group Key: coupon_collections.member_id                                                                                                                           |
                                      ->  Sort  (cost=186223.04..188756.33 rows=1013315 width=4)                                                                                                        |
                                            Sort Key: coupon_collections.member_id                                                                                                                      |
                                            ->  Parallel Seq Scan on coupon_collections  (cost=0.00..71285.07 rows=1013315 width=4)                                                                     |
                                                  Filter: ((seen >= '2017-11-02 14:15:43.111597'::timestamp without time zone) AND (seen <= '2018-11-02 14:15:43.111652'::timestamp without time zone)) |
        ->  Index Scan using mobile_data_member_id_index on mobile_data  (cost=0.42..8.44 rows=1 width=184)                                                                                             |
              Index Cond: (member_id = profiles.member_id)                                                                                                                                              |
  ->  Index Scan using consents_member_id_name_index on consents consents_sms_marketing  (cost=0.43..8.45 rows=1 width=4)                                                                               |
        Index Cond: ((member_id = profiles.member_id) AND (name = 'sms_marketing'::text))                                                                                                               |
        Filter: (value IS TRUE)                                                                                                                                                                         |

Explain for create materialized view:

QUERY PLAN                                                                                                                                                                                  |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Nested Loop  (cost=440022.05..570133.95 rows=1 width=574)                                                                                                                                   |
  ->  Nested Loop Left Join  (cost=440021.62..570125.50 rows=1 width=574)                                                                                                                   |
        Filter: ((mobile_data.push_enabled IS FALSE) OR (mobile_data.push_token IS NULL))                                                                                                   |
        ->  Nested Loop Anti Join  (cost=440021.19..570117.05 rows=1 width=390)                                                                                                             |
              Join Filter: (coupon_collections.member_id = profiles.member_id)                                                                                                              |
              ->  Bitmap Heap Scan on profiles  (cost=2119.99..108286.50 rows=1 width=390)                                                                                                  |
                    Recheck Cond: ((community_id = 123) AND (msisdn IS NOT NULL))                                                                                                          |
                    Filter: (((optin_date)::date >= '2012-01-01'::date) AND ((optin_date)::date <= '2018-09-30'::date) AND ((data ->> 'user_status'::text) = 'verified'::text))   |
                    ->  Bitmap Index Scan on profiles_community_id_msisdn_index  (cost=0.00..2119.99 rows=52756 width=0)                                                                    |
                          Index Cond: ((community_id = 123) AND (msisdn IS NOT NULL))                                                                                                      |
              ->  Unique  (cost=437901.20..451237.94 rows=470783 width=4)                                                                                                                   |
                    ->  Group  (cost=437901.20..450060.98 rows=470783 width=4)                                                                                                              |
                          Group Key: coupon_collections.member_id                                                                                                                           |
                          ->  Sort  (cost=437901.20..443981.09 rows=2431955 width=4)                                                                                                        |
                                Sort Key: coupon_collections.member_id                                                                                                                      |
                                ->  Seq Scan on coupon_collections  (cost=0.00..113447.57 rows=2431955 width=4)                                                                             |
                                      Filter: ((seen >= '2017-11-02 14:15:43.111597'::timestamp without time zone) AND (seen <= '2018-11-02 14:15:43.111652'::timestamp without time zone)) |
        ->  Index Scan using mobile_data_member_id_index on mobile_data  (cost=0.42..8.44 rows=1 width=184)                                                                                 |
              Index Cond: (member_id = profiles.member_id)                                                                                                                                  |
  ->  Index Scan using consents_member_id_name_index on consents consents_sms_marketing  (cost=0.43..8.45 rows=1 width=4)                                                                   |
        Index Cond: ((member_id = profiles.member_id) AND (name = 'sms_marketing'::text))                                                                                                   |
        Filter: (value IS TRUE)                                                                                                                                                             |

EDIT: Rubber duck for the win. After Writing this question I finally notices that with creating a view planner decides to use nested loop instead of hash/merge join. As both sides are relatively big, this is horribly wrong. I forced the planner to change its mind by issuing set enable_nestloop = false and now it runs in 10 seconds, which is totally acceptable (plan below).

The question remains – how to optimize the structure to let planner choose efficient join method on it's own, without resorting to such dirty tricks?

Promised explain analyze:

QUERY PLAN                                                                                                                                                                                                  |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Hash Join  (cost=603547.70..699558.89 rows=1 width=573) (actual time=7922.084..9402.481 rows=17036 loops=1)                                                                                                 |
  Hash Cond: (consents_sms_marketing.member_id = profiles.member_id)                                                                                                                                        |
  ->  Seq Scan on consents consents_sms_marketing  (cost=0.00..90423.46 rows=1490059 width=4) (actual time=0.010..857.382 rows=1658797 loops=1)                                                             |
        Filter: ((value IS TRUE) AND (name = 'sms_marketing'::text))                                                                                                                                        |
        Rows Removed by Filter: 2420637                                                                                                                                                                     |
  ->  Hash  (cost=603547.68..603547.68 rows=1 width=573) (actual time=7921.077..7921.077 rows=18170 loops=1)                                                                                                |
        Buckets: 16384 (originally 1024)  Batches: 2 (originally 1)  Memory Usage: 3969kB                                                                                                                   |
        ->  Hash Right Join  (cost=565515.53..603547.68 rows=1 width=573) (actual time=7239.595..7904.829 rows=18170 loops=1)                                                                               |
              Hash Cond: (mobile_data.member_id = profiles.member_id)                                                                                                                                       |
              Filter: ((mobile_data.push_enabled IS FALSE) OR (mobile_data.push_token IS NULL))                                                                                                             |
              Rows Removed by Filter: 2750                                                                                                                                                                  |
              ->  Seq Scan on mobile_data  (cost=0.00..35173.38 rows=762338 width=183) (actual time=0.009..211.746 rows=762358 loops=1)                                                                     |
              ->  Hash  (cost=565515.51..565515.51 rows=1 width=390) (actual time=7235.692..7235.692 rows=20920 loops=1)                                                                                    |
                    Buckets: 16384 (originally 1024)  Batches: 4 (originally 1)  Memory Usage: 3969kB                                                                                                       |
                    ->  Merge Anti Join  (cost=546291.23..565515.51 rows=1 width=390) (actual time=4172.760..7203.434 rows=20920 loops=1)                                                                   |
                          Merge Cond: (profiles.member_id = coupon_collections.member_id)                                                                                                                   |
                          ->  Sort  (cost=108286.51..108286.51 rows=1 width=390) (actual time=226.020..270.103 rows=59822 loops=1)                                                                          |
                                Sort Key: profiles.member_id                                                                                                                                                |
                                Sort Method: external merge  Disk: 23384kB                                                                                                                                  |
                                ->  Bitmap Heap Scan on profiles  (cost=2119.99..108286.50 rows=1 width=390) (actual time=26.365..135.034 rows=59822 loops=1)                                               |
                                      Recheck Cond: ((community_id = 123) AND (msisdn IS NOT NULL))                                                                                                        |
                                      Filter: (((optin_date)::date >= '2012-01-01'::date) AND ((optin_date)::date <= '2018-09-30'::date) AND ((data ->> 'user_status'::text) = 'verified'::text)) |
                                      Rows Removed by Filter: 2375                                                                                                                                          |
                                      Heap Blocks: exact=23559                                                                                                                                              |
                                      ->  Bitmap Index Scan on profiles_community_id_msisdn_index  (cost=0.00..2119.99 rows=52756 width=0) (actual time=17.690..17.690 rows=62248 loops=1)                  |
                                            Index Cond: ((community_id = 123) AND (msisdn IS NOT NULL))                                                                                                    |
                          ->  Unique  (cost=438004.72..451344.19 rows=470784 width=4) (actual time=3946.732..6637.208 rows=743564 loops=1)                                                                  |
                                ->  Group  (cost=438004.72..450167.23 rows=470784 width=4) (actual time=3946.729..6134.888 rows=743564 loops=1)                                                             |
                                      Group Key: coupon_collections.member_id                                                                                                                               |
                                      ->  Sort  (cost=438004.72..444085.97 rows=2432502 width=4) (actual time=3946.725..5048.156 rows=2367134 loops=1)                                                      |
                                            Sort Key: coupon_collections.member_id                                                                                                                          |
                                            Sort Method: external merge  Disk: 32760kB                                                                                                                      |
                                            ->  Seq Scan on coupon_collections  (cost=0.00..113475.11 rows=2432502 width=4) (actual time=0.014..1660.434 rows=2384812 loops=1)                              |
                                                  Filter: ((seen >= '2017-11-02 14:15:43.111597'::timestamp without time zone) AND (seen <= '2018-11-02 14:15:43.111652'::timestamp without time zone))     |
                                                  Rows Removed by Filter: 2456783                                                                                                                           |
Planning time: 0.745 ms                                                                                                                                                                                     |
Execution time: 9455.154 ms                                                                                                                                                                                 |

Best Answer

The solution was to take a look at pg_stats and notice that there is very low correlation for some columns. The adjusting the stats for those relevant columns using:

ALTER TABLE coupon_collections ALTER COLUMN member_id SET STATISTICS 1000

Now creating view is slower by a very reasonable factor.