I don't understand why the following query is so slow (with Postgresql 9.6):
SELECT s.pkid AS pkid_site , s.geom AS geom_site , z.pkid AS pkidEmprise, z.geom AS geom_emprise, z.précision_contour
FROM traitements.sites_candidats AS s
JOIN traitements.zones_sites AS z
ON z.pkid_site = s.pkid
WHERE s.statut = 'selected'
AND z.statut = 'selected';
The structure of the tables is the following:
CREATE TABLE traitements.sites_candidats (
pkid serial PRIMARY KEY,
statut varchar(255) NOT NULL,
geom geometry(Point, 2154)
);
CREATE INDEX ON traitements.sites_candidats (statut);
CREATE TABLE traitements.zones_sites (
pkid serial PRIMARY KEY,
pkid_site integer NOT NULL,
geom geometry(MultiPolygon,2154),
statut varchar(100)
);
CREATE INDEX zones_sites_idx_pkid_site ON traitements.zones_sites (pkid_site);
CREATE INDEX zones_sites_idx_statut ON traitements.zones_sites (statut) ;
ALTER TABLE traitements.zones_sites
ADD CONSTRAINT zones_sites_references_sites_candidats FOREIGN KEY (pkid_site) REFERENCES traitements.sites_candidats(pkid) ON DELETE CASCADE;
(I deleted a couple of columns that are not involved in the query in order to improve readability).
The result of EXPLAIN ANALYZE
:
Hash Join (cost=17709.29..152088.29 rows=147368 width=887) (actual time=137.074..879.884 rows=210708 loops=1)
Hash Cond: (z.pkid_site = s.pkid)
-> Seq Scan on zones_sites z (cost=0.00..85198.14 rows=210717 width=855) (actual time=0.140..384.964 rows=210708 loops=1)
Filter: ((statut)::text = 'selected'::text)
Rows Removed by Filter: 23
-> Hash (cost=13433.16..13433.16 rows=210490 width=36) (actual time=136.772..136.772 rows=210708 loops=1)
Buckets: 65536 Batches: 8 Memory Usage: 2191kB
-> Seq Scan on sites_candidats s (cost=0.00..13433.16 rows=210490 width=36) (actual time=3.085..87.774 rows=210708 loops=1)
Filter: ((statut)::text = 'selected'::text)
Rows Removed by Filter: 90265
Planning time: 0.386 ms
Execution time: 888.436 ms
It's obviously the JOIN .. ON
that consumes time because it does a hash.
Table traitements.sites_candidats
has about 300k rows (210k of them with statut='selected'
) and traitements.zones_sites
about 210k rows (nearly 99% of have statut='selected'). The query takes more than 4 minutes.
Best Answer
Composite indices including
statut
might help.Since only rows with
statut = 'selected'
are of interest, partial indexes might help some more:Smaller indices, less write load and fragmentation.
However, since most rows in
zones_sites
havestatut = 'selected'
, and since you fetch several columns from that table, and (almost) all rows, Postgres will stick to a sequential scan as fastest option and the only index that can help is the one onsites_candidats
- where also ~ 70 % of all rows qualify. So a bitmap index scan using the above index would most likely still be more expensive than another sequential scan onzones_sites
.The remaining option to improve performance is an index-only scan. Consider appending the only other column of interest from
sites_candidats
to the index:Only useful if preconditions for index-only scans are met, and it becomes more attractive if the table is big and the appended column is small. Related:
More on index-only scans in my answer to your last question:
Don't expect much, retrieving
210708
rows (according to yourEXPLAIN
output) will still take some time. If you don't actually need all of them, considerLIMIT
(combined withORDER BY
) or additional predicates to fetch fewer rows and make it much faster.Asides:
I would not use accented letters in identifiers (like:
précision_contour
). While that works, it introduces potential encoding problems.Why different data types for
statut
?varchar(255)
,varchar(100)
. Seems like those should be the same. If only a handful of states are possible, make that anenum
or a tiny FK column pointing to astatut
table - to make everything smaller and faster, yet. And less error-prone.You had Postgres 9.5 in your last question yesterday, but Postgres 9.6 today. Did you just upgrade? If so, did you run
ANALYZE
after the upgrade? Table statistics are not carried over. Related: