I have a table with about 9 million rows where i'm keeping customer's id, a timestamp, and a cost value for the particular date-time.
It looks like this:
customer|t|consumption
I need to split my dataset into smaller tables which will inherit from the table master.
Master table:
CREATE TABLE my_table_master (
customer TEXT NOT NULL,
t TIMESTAMP NOT NULL,
consumption INTEGER NOT NULL
);
Child tables:
CREATE TABLE my_table_2 (
CHECK ( t='2013-07-01 01:00:00')
) INHERITS (my_table_master);
CREATE TABLE my_table_3 (
CHECK ( my_extract(t)=0 )
) INHERITS (my_table_master);
CREATE TABLE my_table_4 (
CHECK ( t <> '2013-07-01 01:00:00' AND my_extract(t) <> 0 )
) INHERITS (my_table_master);
The problem
Constraint exclusion is not working here. Note that i have already execute the command:
SET constraint_exclusion = on;
If for example i run the following query, the planner includes my_table_2 and my_table_3 instead of just my_table_2
.
SELECT * FROM my_table_master WHERE t='2013-07-01 01:00:00';
Query Plan:
Append (cost=0.00..24601.55 rows=967 width=24) (actual time=164.514..164.693 rows=993 loops=1)
-> Seq Scan on public.my_table_master (cost=0.00..0.00 rows=1 width=44) (actual time=0.010..0.010 rows=0 loops=1)
Output: my_table_master.customer, my_table_master.t, my_table_master.consumption
Filter: (my_table_master.t = '2013-07-01 01:00:00'::timestamp without time zone)
-> Seq Scan on public.my_table_3 (cost=0.00..24577.43 rows=960 width=24) (actual time=164.472..164.472 rows=0 loops=1)
Output: my_table_3.customer, my_table_3.t, my_table_3.consumption
Filter: (my_table_3.t = '2013-07-01 01:00:00'::timestamp without time zone)
Rows Removed by Filter: 1237954
-> Seq Scan on public.my_table_2 (cost=0.00..24.13 rows=6 width=44) (actual time=0.029..0.165 rows=993 loops=1)
Output: my_table_2.customer, my_table_2.t, my_table_2.consumption
Filter: (my_table_2.t = '2013-07-01 01:00:00'::timestamp without time zone)
Planning time: 0.298 ms
Execution time: 164.731 ms
EDIT:
If i'm only asking two specific queries on the DB that include the following WHERE clauses respectively:
WHERE t='2013-07-01 01:00:00'
WHERE my_extract(t)=0
do you think i should possibly split my dataset in a different way?
If yes, please let me know.
Best Answer
On closer inspection... PostgreSQL is actually functioning as expected:
my_table_master
is scanned because there is no constraint on it. The master table can actually have rows, there's nothing preventing it in PostgreSQL inheritance.my_table_3
is scanned because its check constraint (t='2013-07-01 01:00:00'
) doesn't have anything to do (in principle) with the condition in theWHERE
clause. Even if the definition of the functionmy_extract
makesmy_extract(t) = 0
incompatible witht='2013-07-01 01:00:00'
, the database is clearly not "intelligent enough" to infer it. You could add, if appropriate, even if redundant,NOT (my_extract(t) = 0)
in your where clause. That would preventmy_table_3
from being scanned.my_table_2
is scanned as expected.my_table_4
is not actually scanned.So, the constraints prevent one table from being scanned out of the 4.
Alternate query, adding the (my_extract(t) = 0) clause:
returns
By adding the extra condition, the scan of
my_table_3
is prevented.NOTE: To make the trials, I've made up
my_extract
like this:I've used plpgsql to ensure it is not inlined (even if it could be trivially done). Notice that it is necessary that the function be labeled as
IMMUTABLE
(that is, its result only depends on its parameters), or it won't work as expected.ADDENDUM
Works as well as expected:
my_table_master
-> as beforemy_table_2
-> is scanned because its constraint doesn't have any immediate relation tomy_table_3
-> is scanned because its constraint actually match the WHERE clause.What PostgreSQL could do (and it doesn't) is eliminate the
FILTER my_extract(t)=0
when scanning table 3, because the condition is already known to be true. Let's hope this optimization is implemented at some point in the future.