PostgreSQL – Partitioning Constraint Exclusion Not Working

constraintpartitioningpostgresql-9.6

I have a table with about 9 million rows where i'm keeping customer's id, a timestamp, and a cost value for the particular date-time.

It looks like this:

customer|t|consumption

I need to split my dataset into smaller tables which will inherit from the table master.

Master table:

CREATE TABLE my_table_master (
 customer TEXT NOT NULL,
 t TIMESTAMP NOT NULL,
 consumption INTEGER NOT NULL 
);

Child tables:

CREATE TABLE my_table_2 (
CHECK ( t='2013-07-01 01:00:00')
) INHERITS (my_table_master);

CREATE TABLE my_table_3 (
CHECK ( my_extract(t)=0 )
) INHERITS (my_table_master);

CREATE TABLE my_table_4 (
CHECK ( t <> '2013-07-01 01:00:00' AND my_extract(t) <> 0 )
) INHERITS (my_table_master);

The problem

Constraint exclusion is not working here. Note that i have already execute the command:

SET constraint_exclusion = on;

If for example i run the following query, the planner includes my_table_2 and my_table_3 instead of just my_table_2.

SELECT * FROM my_table_master WHERE t='2013-07-01 01:00:00';

Query Plan:

Append  (cost=0.00..24601.55 rows=967 width=24) (actual time=164.514..164.693 rows=993 loops=1)
->  Seq Scan on public.my_table_master  (cost=0.00..0.00 rows=1 width=44) (actual time=0.010..0.010 rows=0 loops=1)
 Output: my_table_master.customer, my_table_master.t, my_table_master.consumption
 Filter: (my_table_master.t = '2013-07-01 01:00:00'::timestamp without time zone)
 ->  Seq Scan on public.my_table_3  (cost=0.00..24577.43 rows=960 width=24) (actual time=164.472..164.472 rows=0 loops=1)
 Output: my_table_3.customer, my_table_3.t, my_table_3.consumption
 Filter: (my_table_3.t = '2013-07-01 01:00:00'::timestamp without time zone)
 Rows Removed by Filter: 1237954
 ->  Seq Scan on public.my_table_2  (cost=0.00..24.13 rows=6 width=44) (actual time=0.029..0.165 rows=993 loops=1)
 Output: my_table_2.customer, my_table_2.t, my_table_2.consumption
 Filter: (my_table_2.t = '2013-07-01 01:00:00'::timestamp without time zone)
 Planning time: 0.298 ms
 Execution time: 164.731 ms

EDIT:

If i'm only asking two specific queries on the DB that include the following WHERE clauses respectively:

WHERE t='2013-07-01 01:00:00' 
WHERE my_extract(t)=0

do you think i should possibly split my dataset in a different way?
If yes, please let me know.

Best Answer

On closer inspection... PostgreSQL is actually functioning as expected:

  1. my_table_master is scanned because there is no constraint on it. The master table can actually have rows, there's nothing preventing it in PostgreSQL inheritance.
  2. my_table_3 is scanned because its check constraint (t='2013-07-01 01:00:00') doesn't have anything to do (in principle) with the condition in the WHERE clause. Even if the definition of the function my_extract makes my_extract(t) = 0 incompatible with t='2013-07-01 01:00:00', the database is clearly not "intelligent enough" to infer it. You could add, if appropriate, even if redundant, NOT (my_extract(t) = 0) in your where clause. That would prevent my_table_3 from being scanned.
  3. my_table_2 is scanned as expected.
  4. my_table_4 is not actually scanned.

So, the constraints prevent one table from being scanned out of the 4.

Alternate query, adding the (my_extract(t) = 0) clause:

EXPLAIN 
    SELECT * 
    FROM my_table_master 
    WHERE t='2013-07-01 01:00:00' 
          AND NOT (my_extract(t) = 0)  /* This is the new addition */ ;

returns

Append  (cost=0.00..309.45 rows=7 width=44)
  ->  Seq Scan on my_table_master  (cost=0.00..0.00 rows=1 width=44)
        Filter: ((t = '2013-07-01 01:00:00'::timestamp without time zone) AND (my_extract(t) <> 0))
  ->  Seq Scan on my_table_2  (cost=0.00..309.45 rows=6 width=44)
        Filter: ((t = '2013-07-01 01:00:00'::timestamp without time zone) AND (my_extract(t) <> 0))

By adding the extra condition, the scan of my_table_3 is prevented.

NOTE: To make the trials, I've made up my_extract like this:

CREATE FUNCTION my_extract(_t timestamp without time zone) 
RETURNS integer AS
$$
begin
    return extract(day from _t)::integer ;
end
$$
LANGUAGE plpgsql IMMUTABLE /* attention! */ ;

I've used plpgsql to ensure it is not inlined (even if it could be trivially done). Notice that it is necessary that the function be labeled as IMMUTABLE (that is, its result only depends on its parameters), or it won't work as expected.

ADDENDUM

EXPLAIN 
    SELECT * FROM my_table_master 
    WHERE my_extract(t) = 0 ;

  Append  (cost=0.00..613.25 rows=13 width=44)
    ->  Seq Scan on my_table_master  (cost=0.00..0.00 rows=1 width=44)
          Filter: (my_extract(t) = 0)
    ->  Seq Scan on my_table_2  (cost=0.00..306.62 rows=6 width=44)
          Filter: (my_extract(t) = 0)
    ->  Seq Scan on my_table_3  (cost=0.00..306.62 rows=6 width=44)
          Filter: (my_extract(t) = 0)

Works as well as expected:

  1. my_table_master -> as before
  2. my_table_2 -> is scanned because its constraint doesn't have any immediate relation to
  3. my_table_3 -> is scanned because its constraint actually match the WHERE clause.

What PostgreSQL could do (and it doesn't) is eliminate the FILTER my_extract(t)=0 when scanning table 3, because the condition is already known to be true. Let's hope this optimization is implemented at some point in the future.