Postgresql – Joining two tables partitioned tables having index with the partition key and joining columns does not help much

indexpartitioningpostgresql

I have two partitioned tables that have an index created on the joining key and the partition/primary key (because on partitioned tables on Postgres the partition key must belong always to the index). When I run this query:

select
  mol.*
from
  merchant_order_lines mol
  inner join merchant_orders mo on
  mol.fk_x_orders_id = mo.sk_id
where 
  mo.sk_id = 552369076;

I get the next plan with explain analyze

 Nested Loop  (cost=0.71..3147351.05 rows=419 width=594) (actual time=0.106..727738.027 rows=1 loops=1)


 ->  Index Only Scan using merchant_orders_2017_pkey on merchant_orders_2017 mo  (cost=0.56..8.58 rows=1 width=8) (actual time=0.023..0.026 rows=1 loops=1)
         Index Cond: (sk_id = 552382076)
         Heap Fetches: 1
   ->  Append  (cost=0.14..3147338.27 rows=419 width=594) (actual time=0.079..727737.994 rows=1 loops=1)
         ->  Index Scan using merchant_order_lines_2016_0_sk_id_fk_x_orders_id_idx on merchant_order_lines_2016_0 mol  (cost=0.14..9.28 rows=1 width=514) (actual time=0.031..0.031 r
ows=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using merchant_order_lines_2016_1_sk_id_fk_x_orders_id_idx on merchant_order_lines_2016_1 mol_1  (cost=0.14..9.05 rows=1 width=632) (actual time=0.011..0.011
 rows=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using idx_fk_x_orders_id_merchant_order_lines_2017_0 on merchant_order_lines_2017_0 mol_2  (cost=0.56..33.25 rows=84 width=577) (actual time=0.037..0.038 row
s=1 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using merchant_order_lines_2017_1_sk_id_fk_x_orders_id_idx on merchant_order_lines_2017_1 mol_3  (cost=0.56..893039.49 rows=72 width=600) (actual time=198583
.513..198583.513 rows=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using merchant_order_lines_2018_0_sk_id_fk_x_orders_id_idx on merchant_order_lines_2018_0 mol_4  (cost=0.56..1126430.31 rows=130 width=593) (actual time=2836
18.835..283618.835 rows=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using merchant_order_lines_2018_1_sk_id_fk_x_orders_id_idx on merchant_order_lines_2018_1 mol_5  (cost=0.56..1127814.78 rows=131 width=604) (actual time=2455
35.558..245535.558 rows=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
 Planning Time: 14.336 ms
 Execution Time: 727738.113 ms

So the query is using the index (and the ones inherited by the partitions) but I getting and awful time. When I manually create indexes on fk_x_orders_id only per every partition I get the next plan:



 Nested Loop  (cost=0.71..135.21 rows=419 width=594) (actual time=4.232..4.294 rows=1 loops=1)
   ->  Index Only Scan using merchant_orders_2017_pkey on merchant_orders_2017 mo  (cost=0.56..8.58 rows=1 width=8) (actual time=3.431..3.432 rows=1 loops=1)
         Index Cond: (sk_id = 552382076)
         Heap Fetches: 1
   ->  Append  (cost=0.14..122.44 rows=419 width=594) (actual time=0.795..0.855 rows=1 loops=1)
         ->  Index Scan using merchant_order_lines_2016_0_sk_id_fk_x_orders_id_idx on merchant_order_lines_2016_0 mol  (cost=0.14..9.28 rows=1 width=514) (actual
 time=0.010..0.010 rows=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using merchant_order_lines_2016_1_sk_id_fk_x_orders_id_idx on merchant_order_lines_2016_1 mol_1  (cost=0.14..9.05 rows=1 width=632) (actu
al time=0.007..0.007 rows=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using idx_fk_x_orders_id_2017_0 on merchant_order_lines_2017_0 mol_2  (cost=0.56..33.25 rows=84 width=577) (actual time=0.778..0.779 rows
=1 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using idx_fk_x_orders_id_2017_1 on merchant_order_lines_2017_1 mol_3  (cost=0.56..15.51 rows=72 width=600) (actual time=0.017..0.018 rows
=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using idx_fk_x_orders_id_2018_0 on merchant_order_lines_2018_0 mol_4  (cost=0.56..33.38 rows=130 width=593) (actual time=0.020..0.020 row
s=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
         ->  Index Scan using idx_fk_x_orders_id_2018_1 on merchant_order_lines_2018_1 mol_5  (cost=0.56..19.87 rows=131 width=604) (actual time=0.019..0.019 row
s=0 loops=1)
               Index Cond: (fk_x_orders_id = 552382076)
 Planning Time: 0.315 ms
 Execution Time: 4.336 ms

Why does the index that contain the partition/primary key and the column fk_x_orders_id is behaving so badly?

The table structures are as follows:

CREATE TABLE public.merchant_orders (
    sk_id int8 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    fk_r_companies_id int4 NOT NULL DEFAULT 0,
    x_order_id varchar(50) NULL,
    fk_x_stores_id int4 NULL DEFAULT 0,
    fk_x_shoppers_id int4 NULL DEFAULT 0,
    x_order_number varchar(50) NULL,
    created_at timestamp(0) NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at timestamp(0) NULL,
    x_created_at timestamp(0) NULL,
    x_updated_at timestamp(0) NULL,
    extra_info json NULL,
    CONSTRAINT merchant_orders_pkey2 PRIMARY KEY (sk_id)
  )
  PARTITION BY RANGE (sk_id);



CREATE TABLE public.merchant_order_lines (
    sk_id int8 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    fk_x_orders_id int4 NOT NULL DEFAULT 0,
    x_order_line_id varchar(64) NULL,
    x_product_id varchar(64) NULL,
    x_variant_id varchar(64) NULL,
    x_sku varchar(64) NULL,
    fk_x_addresses_id_warehouse int4 NULL,
    extra_info json NULL
  )
  PARTITION BY RANGE (sk_id);

And both the global index:

create index idx_fk_x_orders_id on
  public.merchant_order_lines
     using btree (sk_id, fk_x_orders_id);

And every partition index:

create index idx_fk_x_orders_id_2018_1 on
  public.merchant_order_lines_2018_1
    using btree (fk_x_orders_id);

Best Answer

because on partitioned tables on Postgres the partition key must belong always to the index

This is a misconception, and perhaps accidentally the cause of the problem. The partition key must be part of the primary key or part of a parental unique index, but it does not need to be part of other indexes. So you can create a index on (fk_x_orders_id) on the partitioned parent table, and doing so will propagate to each partition. You don't need to manually make them on each partition.

You could have also fixed the problem just be reversing the order of the columns in the existing index, to be (fk_x_orders_id, sk_id);

Your bad plan is using the index just as a skinny table. It should be faster than reading the table itself, as there is less data to read from disk, but it has still has to read the entire index. You can't jump to the relevant part of the btree index when you are only testing a second or later column in the index and not constraining the leading column.