I'm using Hive currently and it has partitioned tables.
Let's say this is my data:
Country City DailySale
USA NY 200.2
USA SF 190.5
CAN TR 390.4
The I can define my table
create table SALES_DATA(
Country STRING,
City STRING,
DailySale BIGDECIMAL
) PARTITIONED BY (Country, City)
Then I create catalog structure on HDFS:
SALES_DATA/
Country=USA/
City=NY/
data-file-0001
data-file-0002
... the rest omitted for brevity...
Then partition pruning can be applied for queries like
SELECT SUM(DailySale) FROM SALES_DATA
WHERE Country='USA' and City='SF'
Hive engine will prune all partitions except one. Super effective and optimal.
I would like to implement smth similar on Postgres. i'll have immutable tables with aggregated data and would like to apply smth similar for them.
I'm trying to go through it
https://www.postgresql.org/docs/13/ddl-partitioning.html
but don't really follow if it's what I need.
Best Answer
In such a case having an index on country and city will probably perform much better than partitioning.
If you are really set on using partitioning, postgres always uses the CHECK constraints to prune partitions. A way to approach this is to:
When following this procedure and you do a select on SalesData postgres will prune the tables not matching the check constraint.