Postgresql – Suboptimal query plan when updating partitioned table

partitioningperformancepostgresqlpostgresql-10query-performance

Background

  • I have a simple CTE used to update a declaratively-partitioned table.
  • The subquery runs quickly (1.7sec per EXPLAIN ANALYZE) and returns 3,769 records (CTE y in the query below).
  • The UPDATE seeks to update non-index columns of a declaratively-partitioned table. A few characteristics of the table:
    • Contains 21 million records
    • 184 partitions — yes, too many — each child partition has primary/ foreign keys and indices
    • fillfactor=100 (plan to reduce to potentially use HOT on same pages, but not certain that query plan affected by lack of page space)

Problem arises upon UPDATE, per the incremental query plan shown (EXPLAIN) after the subquery (via CTE) runs (several nested loops of hash joins):

-> Hash Join /* Run for each child partition, based on CTE PK = child table PK */
     -> Nested Loop
          -> CTE Scan
          -> Append
               -> Index Scan /* for index on each child partition */
                  ... /* Index scans (for each child partition) */
     -> Hash
          -> Seq Scan /* on child table */
   ... /* Hash Joins (for each child partition) */

Query

The following query is the UPDATE statement causing the issue. Basically, the query performs a couple functions using a value from parent_table that cannot be nested into a single SQL statement (so two CTEs used), then UPDATE the same parent_table for the result (the functions are expensive, so the result stored in the table itself).

WITH x AS (
  SELECT t."p1", t."p2", f(t."b1") OVER "win_x" AS "c1"
  FROM parent_table AS "t"
  WHERE t."p1" IN ('val1','val2')
  WINDOW "win_x" AS (PARTITION BY "p1" ORDER BY "p1","p2")
), y AS (
  SELECT x."p1", x."p2", f(x."c1") OVER "win_y" AS "c2"
  FROM x
  WINDOW "win_y" AS (PARTITION BY "p1" ORDER BY "p1","p2")
)
UPDATE parent_table AS "t2"
SET ("a1")=(t.”b2”*y."c2")
FROM y INNER JOIN parent_table AS "t" USING ("p1","p2")
WHERE t2."p1"=y."p1" AND t2."p2"=y."p2";

Table definition

Table defined as following:

CREATE TABLE IF NOT EXISTS parent_table (
   p1   integer,
   p2  timestamp without time zone,
   a1  numeric,
   b1  numeric,
   b2  numeric,
   c1  numeric,
   c2 numeric,
   ... /* additional 8 columns of numeric type */
   z1 numeric,
   z2 numeric,
CONSTRAINT lbound_z1 CHECK ( “z1”::numeric >= 0),
CONSTRAINT lbound_z2 CHECK ( “z2”::numeric >= 0)
PARTITION BY RANGE (EXTRACT(YEAR FROM p2), EXTRACT(MONTH FROM p2))
WITH (OIDS=‘false’)
TABLESPACE ts_ssd_raid10;

CREATE TABLE IF NOT EXISTS child_table_yyyy_mm PARTITION OF parent_table (
  CONSTRAINT child_pk PRIMARY KEY (p1, p2) WITH (FILLFACTOR=‘90’) USING INDEX TABLESPACE ts_idx_m2ssd
CONSTRAINT child_fk_other_child_yyyy_mm FOREIGN KEY (p1, p2) REFERENCES other_child_yyyy_mm (p1,p2) MATCH FULL WITH (FILLFACTOR=‘90’) USING INDEX TABLESPACE ts_idx_m2ssd )
FOR VALUES FROM (yyyy, mm) TO (yyyy, mm)
WITH (FILLFACTOR=‘90’, OIDS=‘false’)
TABLESPACE ts_ssd_raid10;
ALTER TABLE child_table CLUSTER ON “child_pk”;

Question

How can I perform the UPDATE without the nested loop hash joins for each of the 184 child tables?

System Info

Postgres version 10.3

Best Answer

Reduced the run time to 1-2sec by using a third CTE:

WITH x AS (
  SELECT t."p1", t."p2", t.”b2”, f(t."b1") OVER "win_x" AS "c1"
  FROM parent_table AS "t"
  WHERE t."p1" IN ('val1','val2')
  WINDOW "win_x" AS (PARTITION BY "p1" ORDER BY "p1","p2")
), y AS (
  SELECT x."p1", x."p2", x.”b2”, f(x."c1") OVER "win_y" AS "c2"
  FROM x
  WINDOW "win_y" AS (PARTITION BY "p1" ORDER BY "p1","p2")
), z AS (
  SELECT y.”p1", y.”p2", y.”b2”*y.”c2” AS “a1”
  FROM y
)
UPDATE parent_table AS "t”
SET “a1"=z.”a1”
FROM z
WHERE t.”p1"=z.”p1" AND t.”p2"=z.”p2";