Postgresql – Execution Performance Issue with Similar Query

performancepostgresqlpostgresql-9.5postgresql-performance

I have been having some performance issues running a simple query on PostgreSQL on a table that has about 3 million rows with a join on a table that has about 120 rows.

If I run a query only on the larger table (with no filters), the results will return immediately, but with the join, results take up to 2 minutes to return the same results.

Here is the query:

SELECT
    foi.fk_dim_product, count(1)
FROM evino_bi.fact_order_item AS foi
INNER JOIN dim_order_item_status AS dois
    ON dois.id_dim_order_item_status = foi.fk_dim_order_item_status
WHERE dois.is_reserved = '1'
GROUP BY foi.fk_dim_product;

Running the EXPLAIN (ANALYZE, BUFFER), return the following:

HashAggregate  (cost=1379364.80..1379391.01 rows=2621 width=4) (actual time=84822.667..84822.863 rows=630 loops=1)
  Group Key: foi.fk_dim_product
  Buffers: shared hit=181461 read=1061877
  ->  Hash Join  (cost=6.38..1360785.99 rows=3715762 width=4) (actual time=0.180..84764.538 rows=96703 loops=1)
        Hash Cond: (foi.fk_dim_order_item_status = dois.id_dim_order_item_status)
        Buffers: shared hit=181461 read=1061877
        ->  Seq Scan on fact_order_item foi  (cost=0.00..1301725.54 rows=5839054 width=8) (actual time=0.002..81484.109 rows=5837655 loops=1)
              Buffers: shared hit=181458 read=1061877
        ->  Hash  (cost=5.06..5.06 rows=105 width=4) (actual time=0.053..0.053 rows=70 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 11kB
              Buffers: shared hit=3
              ->  Seq Scan on dim_order_item_status dois  (cost=0.00..5.06 rows=105 width=4) (actual time=0.005..0.038 rows=70 loops=1)
                    Filter: (is_reserved = 1)
                    Rows Removed by Filter: 40
                    Buffers: shared hit=3
Planning time: 0.623 ms
Execution time: 84836.100 ms

The problem that puzzles me the most is that the following query runs in milliseconds and returns the same data:

SELECT
    foi.fk_dim_product, count(1)
FROM evino_bi.fact_order_item foi
INNER JOIN dim_order_item_status dois
  ON dois.id_dim_order_item_status = foi.fk_dim_order_item_status
WHERE dois.is_reserved || '' = '1'
GROUP BY 1

Here is the EXPLAIN (ANALYZE, BUFFER) for the second statement:

HashAggregate  (cost=555597.70..555623.91 rows=2621 width=4) (actual time=249.523..249.673 rows=630 loops=1)
  Group Key: foi.fk_dim_product
  Buffers: shared hit=134117
  ->  Nested Loop  (cost=8172.60..555420.76 rows=35388 width=4) (actual time=2.971..219.564 rows=96860 loops=1)
        Buffers: shared hit=134117
        ->  Seq Scan on dim_order_item_status dois  (cost=0.00..6.30 rows=1 width=4) (actual time=0.011..0.101 rows=70 loops=1)
              Filter: (((is_reserved)::text || ''::text) = '1'::text)
              Rows Removed by Filter: 40
              Buffers: shared hit=3
        ->  Bitmap Heap Scan on fact_order_item foi  (cost=8172.60..553329.08 rows=208538 width=8) (actual time=1.205..2.484 rows=1384 loops=70)
              Recheck Cond: (fk_dim_order_item_status = dois.id_dim_order_item_status)
              Heap Blocks: exact=132362
              Buffers: shared hit=134114
              ->  Bitmap Index Scan on fact_order_item_fk_dim_order_item_status  (cost=0.00..8120.47 rows=208538 width=0) (actual time=0.467..0.467 rows=3903 loops=70)
                    Index Cond: (fk_dim_order_item_status = dois.id_dim_order_item_status)
                    Buffers: shared hit=1752
Planning time: 0.691 ms
Execution time: 249.917 ms

So, PostgreSQL is not planning my query adequately? Are there any performance tweaks that I can perform on my server to avoid the problems like on the first statement and helping PostgreSQL to plan better?

Also, why does the second statement runs absurdly faster?

EDIT

I updated the question adding the information requested.

EDIT 2

Here are the \d for both tables. I've hid some columns from the fact_order_item because the table has almost 150 columns, most are simple numeric values.

dim_order_item_status:

    Table "evino_bi.dim_order_item_status"
          Column          |          Type          |                                        Modifiers
--------------------------+------------------------+------------------------------------------------------------------------------------------
 id_dim_order_item_status | integer                | not null default nextval('dim_order_item_status_id_dim_order_item_status_seq'::regclass)
 src_id_order_item_status | integer                |
 name                     | character varying(100) |
 name_pt                  | character varying(100) |
 is_reserved              | smallint               |
 is_problem               | smallint               |
 payment_status           | character varying(15)  |
 macro_status             | character varying(100) |
 macro_status_pt          | character varying(100) |
 is_solid                 | smallint               |
Indexes:
    "dim_order_item_status_pkey" PRIMARY KEY, btree (id_dim_order_item_status)
    "src_id_order_item_status_idx" UNIQUE, btree (src_id_order_item_status)

fact_order_item:

                                                           Table "evino_bi.fact_order_item"
                  Column                  |            Type             |                                  Modifiers
------------------------------------------+-----------------------------+------------------------------------------------------------------------------
 id_fact_order_item                       | integer                     | not null default nextval('fact_order_item_id_fact_order_item_seq'::regclass)
 src_id_order_item                        | integer                     |
 src_fk_order                             | integer                     |
 order_increment_id                       | character varying(50)       |
 order_type                               | character varying(50)       |
 is_instant_buy                           | smallint                    |
 nfe_number                               | integer                     |
 nfe_serie                                | integer                     |
 nfe_key                                  | character varying(50)       |
 nfe_created_at                           | integer                     |
 nfe_created_at_time                      | integer                     |
 nf_created_at_datetime                   | timestamp without time zone |
 fk_dim_city                              | integer                     |
 fk_dim_product                           | integer                     |
 fk_dim_product_bundle                    | integer                     |
 fk_dim_customer                          | integer                     |
 fk_dim_logistics_provider                | integer                     |
 fk_dim_channel_last_click                | integer                     |
 fk_dim_source_medium_last_click          | integer                     |
 fk_dim_content_last_click                | integer                     |
 fk_dim_campaign_last_click               | integer                     |
 fk_dim_channel_lead                      | integer                     |
 fk_dim_source_medium_lead                | integer                     |
 fk_dim_content_lead                      | integer                     |
 fk_dim_campaign_lead                     | integer                     |
 fk_dim_order_item_status                 | integer                     |
 fk_dim_payment_method                    | integer                     |
 fk_dim_subscription                      | integer                     |
 fk_dim_order_volume_status               | integer                     |
 fk_dim_order_volume_micro_status         | integer                     |
 fk_dim_sales_rule                        | integer                     |
 fk_dim_region                            | integer                     |
 platform                                 | character varying(40)       |
 created_at                               | integer                     |
 created_at_time                          | integer                     |
 created_at_datetime                      | timestamp without time zone |
 updated_at                               | integer                     |
 updated_at_time                          | integer                     |
 updated_at_datetime                      | timestamp without time zone |
 payment_confirmed_at                     | integer                     |
 payment_confirmed_at_time                | integer                     |
 payment_confirmed_at_datetime            | timestamp without time zone |
 cm2                                      | numeric(10,4)               |
 etl_updated_at                           | timestamp without time zone |
 variations                               | json                        |
Indexes:
    "fact_order_item_pkey" PRIMARY KEY, btree (id_fact_order_item)
    "fk_fact_order_item_src_id_order_item" UNIQUE, btree (src_id_order_item)
    "fact_order_item_fk_dim_campaign_last_click" btree (fk_dim_campaign_last_click)
    "fact_order_item_fk_dim_campaign_lead" btree (fk_dim_campaign_lead)
    "fact_order_item_fk_dim_channel_last_click" btree (fk_dim_channel_last_click)
    "fact_order_item_fk_dim_channel_lead" btree (fk_dim_channel_lead)
    "fact_order_item_fk_dim_city" btree (fk_dim_city)
    "fact_order_item_fk_dim_content_last_click" btree (fk_dim_content_last_click)
    "fact_order_item_fk_dim_content_lead" btree (fk_dim_content_lead)
    "fact_order_item_fk_dim_customer" btree (fk_dim_customer)
    "fact_order_item_fk_dim_logistics_provider" btree (fk_dim_logistics_provider)
    "fact_order_item_fk_dim_order_item_status" btree (fk_dim_order_item_status)
    "fact_order_item_fk_dim_order_volume_status" btree (fk_dim_order_volume_status)
    "fact_order_item_fk_dim_payment_method" btree (fk_dim_payment_method)
    "fact_order_item_fk_dim_product" btree (fk_dim_product)
    "fact_order_item_fk_dim_product_bundle" btree (fk_dim_product_bundle)
    "fact_order_item_fk_dim_region" btree (fk_dim_region)
    "fact_order_item_fk_dim_sales_rule" btree (fk_dim_sales_rule)
    "fact_order_item_fk_dim_source_medium_last_click" btree (fk_dim_source_medium_last_click)
    "fact_order_item_fk_dim_source_medium_lead" btree (fk_dim_source_medium_lead)
    "fact_order_item_fk_dim_subscription" btree (fk_dim_subscription)
    "fk_fact_order_item_created_at" btree (created_at)
    "fk_fact_order_item_delivered_at" btree (delivered_at)
    "fk_fact_order_item_nfe_number_id" btree (nfe_number)
    "fk_fact_order_item_order_increment_id" btree (order_increment_id)
    "fk_fact_order_item_payment_confirmed_at" btree (payment_confirmed_at)
    "fk_fact_order_item_ready_for_picking_at" btree (ready_for_picking_at)
    "fk_fact_order_item_shipped_at" btree (shipped_at)
    "fk_fact_order_item_src_fk_order" btree (src_fk_order)
    "fk_fact_order_item_updated_at" btree (updated_at)

EDIT 3

As requested, here are my DB specs:

Amazon RDS SSD db.m4.xlarge, PostgreSQL 9.5.2, Memory 16 GB, 4 cores.

These configs should be set with the default values as I haven't tweaked any of them:

cpu_tuple_cost: 0.01
random_page_cost: 4
shared_buffers: 4058056kB
work_mem: 64000kB
effective_cache_size: 8116112kB
MAINTENANCE_WORK_MEM: 259MB

Best Answer

You have two Seq Scans in that query.

  1. Over dim_order_item_status.is_reserved
  2. Another one over foi.fk_dim_order_item_status

Do either of these have indexes? If not, create the index and VACUUM ANALYZE the table.

A few other notes,

  1. In PostgreSQL we don't use 1 for boolean.

    ALTER TABLE dim_order_item_status
      ALTER COLUMN is_reserved SET DATA TYPE bool;
    
  2. Also count(1) is better written as count(*)

As for why the second query is faster, the concatenation is slowing it down so it pushes the index scan further down. My assumption is that your statistics are off for the rows returned by fk_dim_order_item_status = dois.id_dim_order_item_status. Normally, we could verify this if you ran EXPLAIN ANALYZE rather than just EXPLAIN. But anyway, run either VACUUM ANALYZE or ANALYZE on those tables if the problem persists after you create the indexes above.

If you come back we need:

  1. VACUUM ANALYZE for all queries.
  2. \d dim_order_item_status
  3. \d dim_order_item_foi