I had a query that was taking way too much time despite having all the indexes I could think of.
Eventually, I realized that one of the JOIN .. ON
inside the query was casting a column's content to another data type because the column from table A
is of type varchar
while the matching column of table B
is of type integer
.
I changed my code to use a temporary table in which I insert the rows I need from table A
and cast the column from varchar
to integer
.
It improved the query speed by about a 1000x!
Table zone_site
:
CREATE TABLE traitements.zones_sites (
pkid serial PRIMARY KEY,
pkid_site integer NOT NULL,
origine varchar(50) NOT NULL,
origine_id varchar(255) NOT NULL, -- can't be converted to int
catégorie varchar(255) NOT NULL,
horodatage timestamp NOT NULL,
geom geometry(MultiPolygon,2154),
précision_contour varchar(100),
statut varchar(100),
détails_jsonb text,
CONSTRAINT enforce_dims_geomloc CHECK (ST_NDims(geom) = 2)
);
CREATE INDEX zones_sites_idx_pkid_site ON traitements.zones_sites (pkid_site);
CREATE INDEX zones_sites_géométrie ON traitements.zones_sites USING GIST (geom);
CREATE INDEX zones_sites_précision_contour ON traitements.zones_sites (précision_contour);
CREATE INDEX zones_sites_idx_catégorie ON traitements.zones_sites (catégorie);
CREATE INDEX zones_sites_idx_origine ON traitements.zones_sites (origine);
CREATE INDEX zones_sites_idx_statut ON traitements.zones_sites (statut);
ALTER TABLE traitements.zones_sites
ADD CONSTRAINT zones_sites_references_sites_candidats FOREIGN KEY (pkid_site)
REFERENCES traitements.sites_candidats(pkid) ON DELETE CASCADE;
Related question on dba.SE:
Unfortunately, the column traitements.zones_sites.origine_id
cannot be of type integer
because it holds identifier from several origins, some of them not being integers.
Original query:
SELECT emprises.pkid, emprises.pkid_site, emprises.origine, parcelles.idpar
FROM traitements.zones_sites AS emprises
JOIN parcelles ON parcelles.idpk::varchar = emprises.origine_id
WHERE emprises.catégorie = 'emprise_site'
AND emprises.précision_contour = 'contour_inconnu'
AND emprises.origine = 'xxxxxxxxx';
Modified code:
CREATE TEMPORARY TABLE temp_emprises (
pkid serial PRIMARY KEY,
pkid_site integer NOT NULL,
origine varchar(50) NOT NULL,
idpk_parcelle integer NOT NULL
);
INSERT INTO temp_emprises (pkid, pkid_site, origine, idpk_parcelle)
SELECT pkid, pkid_site, origine, origine_id::integer
FROM traitements.zones_sites AS emprises
WHERE emprises.catégorie = 'emprise_site'
AND emprises.précision_contour = 'contour_inconnu'
AND emprises.origine = 'xxxxxxxxx';
CREATE INDEX ON temp_emprises(idpk_parcelle);
SELECT emprises.pkid, emprises.pkid_site, emprises.origine, parcelles.idpar
FROM temp_emprises AS emprises
JOIN parcelles ON parcelles.idpk = emprises.idpk_parcelle;
Is there a better solution?
Best Answer
Proper solution
Your comment (which I added to your question) discloses the root of the problem:
A roughly normalized DB design would avoid to mix data of different kind in the same column. Then your true integer values could be in a column of type
integer
and the rest in (a)text
column(s). And there wouldn't be a problem to begin with.While stuck with your current design
Since, as you asserted, the column
origine_id
can be cast to integer for the selection of rows in your query, I suggest this much faster alternative: create a partial expression index:If the cast in your
INSERT
into the temp table is guaranteed to work, then so is this index.Slightly modified query:
Since the partial index also supports your query perfectly, this should make your current solutions look like slow motion in comparison. And it's simpler and more reliable, too.
The only other index you need is one on
parcelles(idpk)
.Since the only other column you need from
parcelles
isidpar
, if that table is big and write-load is low so you can get index-only scans out of it, an (additional) multicolumn index onparcelles(idpk, idpar)
would make it even faster.And faster, yet, in the upcoming Postgres 11 with true covering indexes with an
INCLUDE
column:Related answer with code example: