In PostgreSQL 9.5.0 I have a partitioned table that collects data by months. I tried to use the new PostgreSQL feature of foreign table inheritance and pushed one month of data to another PostgreSQL server, so I got a foreign table. When I run my query from the primary server, the query takes 7x longer to execute than on the server where I have the foreign table. I am not passing a lot of data by network, my query looks like:
explain analyze
SELECT source, global_action, paid, organic, device, count(*) as count, sum(price) as sum
FROM "toys"
WHERE "toys"."container_id" = 857 AND (toys.created_at >= '2015-12-02 05:00:00.000000') AND
(toys.created_at <= '2015-12-30 04:59:59.999999') AND ("toys"."source" IS NOT NULL)
GROUP BY "toys"."source", "toys"."global_action", "toys"."paid", "toys"."organic", "toys"."device";
HashAggregate (cost=1143634.94..1143649.10 rows=1133 width=15) (actual time=1556.894..1557.017 rows=372 loops=1)
Group Key: toys.source, toys.global_action, toys.paid, toys.organic, toys.device
-> Append (cost=0.00..1143585.38 rows=2832 width=15) (actual time=113.420..1507.373 rows=76593 loops=1)
-> Seq Scan on toys (cost=0.00..0.00 rows=1 width=242) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((source IS NOT NULL) AND (created_at >= '2015-12-02 05:00:00'::timestamp without time zone) AND (created_at <= '2015-12-30 04:59:59.999999'::timestamp without time zone) AND (container_id = 857))
-> Foreign Scan on toys_201512_new (cost=100.00..1143585.38 rows=2831 width=15) (actual time=113.419..1488.445 rows=76593 loops=1)
Planning time: 2.990 ms
Execution time: 1560.131 ms
Does PostgreSQL use indexes on foreign tables? (I have indexes defined in the foreign table.) If I run the query directly on that server, it takes 200ms.
Here is parent table definition:
Table "public.toys"
id bigint
job_reference character varying(100)
container_id integer
user_token character varying(1000)
user_ip character varying(100)
user_zip character varying(10)
user_agent character varying(2000)
url_referrer character varying(2000)
page_url character varying(2000)
source character varying(100)
action integer
created_at timestamp without time zone
cpa numeric(9,3) not null default 0.0
duplicate boolean not null default false
fingerprint character varying(255)
email character varying(1000)
mobile_email_apply boolean
country character varying(255)
country_matched boolean
device integer
organic boolean
job_seeker_id character varying(255)
applicant_status integer
ats_applicant_status character varying(255)
ats_applicant_source character varying(255)
price numeric(9,4)
job_group_id integer
analytic_source character varying(255)
global_action integer
paid_organic integer
paid boolean
meta text
params character varying(2000)
analytic_associated_click_id bigint
external_id character varying(100)
associated_click_id bigint
cpc numeric(9,3)
Indexes:
"job_stats_master_pkey1" PRIMARY KEY, btree (id)
Child table has check constraint:
"toys_201512_new_created_at_check" CHECK (
created_at >= '2015-11-30 19:00:00'::timestamp without time zone AND
created_at < '2015-12-31 19:00:00'::timestamp without time zone)
Inherits: toys
And indexes on:
"toys_201512_new_analytic_source" btree (analytic_source)
"toys_201512_new_country" btree (country)
"toys_201512_new_created_at" btree (created_at)
"toys_201512_new_duplicate" btree (duplicate) WHERE duplicate = false
"toys_201512_new_container_id" btree (container_id)
"toys_201512_new_container_id_created_at" btree (container_id, created_at)
"toys_201512_new_fingerprint" btree (fingerprint)
"toys_201512_new_global_action" btree (global_action)
"toys_201512_new_id" btree (id)
"toys_201512_new_job_group_id" btree (job_group_id)
"toys_201512_new_job_reference" btree (job_reference)
"toys_201512_new_on_country_matched" btree (country_matched) WHERE country_matched = true
"toys_201512_new_on_cpa" btree (cpa) WHERE cpa <> 0::numeric
"toys_201512_new_on_duplicate_and_country_matched" btree (duplicate, country_matched) WHERE duplicate = false AND country_matched = true
"toys_201512_new_on_mobile_email_apply" btree (mobile_email_apply) WHERE mobile_email_apply = true
"toys_201512_new_source" btree (source)
"toys_201512_new_user_ip_user_agent" btree (user_ip, user_agent)
"toys_201512_new_user_token" btree (user_token)
Best Answer
Postgres can use indexes on the foreign server. But there are quite a few more obstacles than for local tables. Read the chapter Remote Query Optimization in the manual.
Comments in the current source code of
postgres_fdw.c
are revealing, too:Index
This index of yours looks good for it:
If you have many NULL values you might even make that a partial index by appending
WHERE source IS NOT NULL
, making the index look even better for the Postgres query planner.Statistics for query planning
Make sure the query planner can work with valid statistics. The numbers in your
EXPLAIN
output show quite a mismatch:27x as many rows as Postgres expected were actually returned . The manual:
Since accessing foreign tables is potentially expensive / delicate, this does not happen automatically. Foreign tables are not covered by autovacuum. The manual:
If the remote table changes a lot, you might want to activate
use_remote_estimate
. The manual:Finally, test to see what is actually sent to the foreign server:
Query
Your query decluttered and formatted, with one minor improvement:
Simpler, cleaner and also matches your
CHECK
constraint better and avoids possible corner case problems.