I have two partitioned tables that have an index created on the joining key and the partition/primary key (because on partitioned tables on Postgres the partition key must belong always to the index). When I run this query:
select
mol.*
from
merchant_order_lines mol
inner join merchant_orders mo on
mol.fk_x_orders_id = mo.sk_id
where
mo.sk_id = 552369076;
I get the next plan with explain analyze
Nested Loop (cost=0.71..3147351.05 rows=419 width=594) (actual time=0.106..727738.027 rows=1 loops=1)
-> Index Only Scan using merchant_orders_2017_pkey on merchant_orders_2017 mo (cost=0.56..8.58 rows=1 width=8) (actual time=0.023..0.026 rows=1 loops=1)
Index Cond: (sk_id = 552382076)
Heap Fetches: 1
-> Append (cost=0.14..3147338.27 rows=419 width=594) (actual time=0.079..727737.994 rows=1 loops=1)
-> Index Scan using merchant_order_lines_2016_0_sk_id_fk_x_orders_id_idx on merchant_order_lines_2016_0 mol (cost=0.14..9.28 rows=1 width=514) (actual time=0.031..0.031 r
ows=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using merchant_order_lines_2016_1_sk_id_fk_x_orders_id_idx on merchant_order_lines_2016_1 mol_1 (cost=0.14..9.05 rows=1 width=632) (actual time=0.011..0.011
rows=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using idx_fk_x_orders_id_merchant_order_lines_2017_0 on merchant_order_lines_2017_0 mol_2 (cost=0.56..33.25 rows=84 width=577) (actual time=0.037..0.038 row
s=1 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using merchant_order_lines_2017_1_sk_id_fk_x_orders_id_idx on merchant_order_lines_2017_1 mol_3 (cost=0.56..893039.49 rows=72 width=600) (actual time=198583
.513..198583.513 rows=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using merchant_order_lines_2018_0_sk_id_fk_x_orders_id_idx on merchant_order_lines_2018_0 mol_4 (cost=0.56..1126430.31 rows=130 width=593) (actual time=2836
18.835..283618.835 rows=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using merchant_order_lines_2018_1_sk_id_fk_x_orders_id_idx on merchant_order_lines_2018_1 mol_5 (cost=0.56..1127814.78 rows=131 width=604) (actual time=2455
35.558..245535.558 rows=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
Planning Time: 14.336 ms
Execution Time: 727738.113 ms
So the query is using the index (and the ones inherited by the partitions) but I getting and awful time. When I manually create indexes on fk_x_orders_id
only per every partition I get the next plan:
Nested Loop (cost=0.71..135.21 rows=419 width=594) (actual time=4.232..4.294 rows=1 loops=1)
-> Index Only Scan using merchant_orders_2017_pkey on merchant_orders_2017 mo (cost=0.56..8.58 rows=1 width=8) (actual time=3.431..3.432 rows=1 loops=1)
Index Cond: (sk_id = 552382076)
Heap Fetches: 1
-> Append (cost=0.14..122.44 rows=419 width=594) (actual time=0.795..0.855 rows=1 loops=1)
-> Index Scan using merchant_order_lines_2016_0_sk_id_fk_x_orders_id_idx on merchant_order_lines_2016_0 mol (cost=0.14..9.28 rows=1 width=514) (actual
time=0.010..0.010 rows=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using merchant_order_lines_2016_1_sk_id_fk_x_orders_id_idx on merchant_order_lines_2016_1 mol_1 (cost=0.14..9.05 rows=1 width=632) (actu
al time=0.007..0.007 rows=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using idx_fk_x_orders_id_2017_0 on merchant_order_lines_2017_0 mol_2 (cost=0.56..33.25 rows=84 width=577) (actual time=0.778..0.779 rows
=1 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using idx_fk_x_orders_id_2017_1 on merchant_order_lines_2017_1 mol_3 (cost=0.56..15.51 rows=72 width=600) (actual time=0.017..0.018 rows
=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using idx_fk_x_orders_id_2018_0 on merchant_order_lines_2018_0 mol_4 (cost=0.56..33.38 rows=130 width=593) (actual time=0.020..0.020 row
s=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
-> Index Scan using idx_fk_x_orders_id_2018_1 on merchant_order_lines_2018_1 mol_5 (cost=0.56..19.87 rows=131 width=604) (actual time=0.019..0.019 row
s=0 loops=1)
Index Cond: (fk_x_orders_id = 552382076)
Planning Time: 0.315 ms
Execution Time: 4.336 ms
Why does the index that contain the partition/primary key and the column fk_x_orders_id
is behaving so badly?
The table structures are as follows:
CREATE TABLE public.merchant_orders (
sk_id int8 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
fk_r_companies_id int4 NOT NULL DEFAULT 0,
x_order_id varchar(50) NULL,
fk_x_stores_id int4 NULL DEFAULT 0,
fk_x_shoppers_id int4 NULL DEFAULT 0,
x_order_number varchar(50) NULL,
created_at timestamp(0) NULL DEFAULT CURRENT_TIMESTAMP,
updated_at timestamp(0) NULL,
x_created_at timestamp(0) NULL,
x_updated_at timestamp(0) NULL,
extra_info json NULL,
CONSTRAINT merchant_orders_pkey2 PRIMARY KEY (sk_id)
)
PARTITION BY RANGE (sk_id);
CREATE TABLE public.merchant_order_lines (
sk_id int8 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
fk_x_orders_id int4 NOT NULL DEFAULT 0,
x_order_line_id varchar(64) NULL,
x_product_id varchar(64) NULL,
x_variant_id varchar(64) NULL,
x_sku varchar(64) NULL,
fk_x_addresses_id_warehouse int4 NULL,
extra_info json NULL
)
PARTITION BY RANGE (sk_id);
And both the global index:
create index idx_fk_x_orders_id on
public.merchant_order_lines
using btree (sk_id, fk_x_orders_id);
And every partition index:
create index idx_fk_x_orders_id_2018_1 on
public.merchant_order_lines_2018_1
using btree (fk_x_orders_id);
Best Answer
This is a misconception, and perhaps accidentally the cause of the problem. The partition key must be part of the primary key or part of a parental unique index, but it does not need to be part of other indexes. So you can create a index on
(fk_x_orders_id)
on the partitioned parent table, and doing so will propagate to each partition. You don't need to manually make them on each partition.You could have also fixed the problem just be reversing the order of the columns in the existing index, to be
(fk_x_orders_id, sk_id)
;Your bad plan is using the index just as a skinny table. It should be faster than reading the table itself, as there is less data to read from disk, but it has still has to read the entire index. You can't jump to the relevant part of the btree index when you are only testing a second or later column in the index and not constraining the leading column.