Postgresql – Insert into partitioned tables with select, handle conflicts

partitioningpostgresql

I need to insert into partitioned tables using a select, how do I handle conflicts? Postgres complains this:
[0A000] ERROR: ON CONFLICT clause is not supported with partitioned tables

My query looks like this:

INSERT INTO table_partition SELECT * FROM old_table WHERE start_time >= '2014-12-01 00:00:00' AND start_time < '2015-01-01 00:00:00' ON CONFLICT ON CONSTRAINT table_partition_201412_pkey DO NOTHING ;

Currently, I have created the partition tables with the old table having triggers to copy insert/update/delete onto the partition tables. I've to backfill the older data onto partitions, once done, I can drop the older tables and start using the partitioned tables, but there's a chance some of the data with few date ranges might overlap. It's a problem for me. Could you please help?

Best Answer

Do the same "by hands" like

WITH cte AS ( INSERT INTO dest
              SELECT FROM src
              WHERE (non-matching condition)
            )
UPDATE dest
SET dst.fieldset = src.fieldset
FROM src
WHERE (matching condition)

Dependent by the data and matched/non-matched relation you may swap inserting and updating parts or return processed records identifiers from CTE and use it in main query.

Related Solutions

SQL Server 2008 R2 – Best Practices for Selecting Partition Boundary Ranges

Let me start by mentioning that these 4 boundaries result in following 5 partitions. Partitions 1 and 5 are unbounded one one side and partitions 2, 3, and 4 are constrained to quarterly data:

partition 1: values less than '2014-04-01'
partition 2: values >= '2014-04-01' and < '2014-07-01'
partition 3: values >= '2014-07-01' and < '2014-10-01'
partition 4: values >= '2014-10-01' and < '2015-01-01'
partition 5: values >= '2015-01-01'

The rationale for keeping (at least) the first and last partitions empty is to ensure expensive data movement is not required during partition maintenance of an incremental or sliding window scenario. If data must be moved to a different partition to accommodate new boundaries during MERGE or SPLIT, the logging is about 4 times greater than normal DDL operations. This is especially nasty with large tables, as is common with table partitioning.

Data movement is never required when an existing empty partition is split or when 2 adjacent empty partitions are merged. This is why the best practice is to plan for such, making the maintenance a no-brainer that can easily be automated.

That said, one can split a non-empty partition without overhead in some cases. For example, you could split the function to add the new '2015-04-01' boundary without data movement as long no existing rows are >= '2015-04-01'. But to also avoid an expensive scan during the offline operation, you need an index with the partitioning column as the high-order key column so that SQL Server can efficiently validate no data exists beyond the new boundary. So in your case, you could probably split the last partition in advance of the start of the next quarter without incurring overhead if the partitioning column is indexed (a common design pattern).

Assuming your objective is 4 quarterly partitions of historical data plus the current active quarter in a sliding window, I suggest bounded partitions for each of these 5 quarters plus another bounded one for the future quarter. This will result in 8 partitions when you include the unbounded first and last ones.

CREATE PARTITION FUNCTION PfDateRange(date)
AS RANGE RIGHT FOR VALUES(
      '2014-01-01'
    , '2014-04-01'
    , '2014-07-01'
    , '2014-10-01'
    , '2015-01-01'
    , '2015-04-01'
    , '2015-07-01');

The reason one might want additional future empty partitions is to provide time to react in case partition maintenance is missed for some reason. Multiple future partitions are more important with more frequent maintenance (e.g. daily), and less so for quarterly maintenance.

After the start of the '2015-04-01' quarter, the partition maintenance script can:

switch '2014-01-01' to a and empty staging table and truncate staging
merge the '2014-01-01' partition to remove it
spilt at '2015-10-01' for the next future quarter

You may want to do a sanity check before the maintenance in case you have bad data outside of the expected range (less than '2014-01-01' or greater than '2015-07-01'). Alternatively, maintain a check constraint for the expected range so that only valid data can be inserted.

Postgresql – How to merge partitions in Postgres

If you are still inserting data for 2014, then you risk problems with this method, because rows inserted between steps 2 and 5 are going to end up getting dropped rather than moved.

If you are not still inserting data for 2014, then I think you should change step 5 to "rewrite the trigger to throw an error upon insertion of 2014 data" and move it up to be step 0. That would remove the doubt.

But it does seem like you are doing a lot of tinkering. If you want to remove the partitioning for 2014, why would you want to keep it for 2015? Why did you implement partitioning in the first place and why is that reason no longer valid for 2014 (but still valid for 2015)? Getting rid of the partitioning might speed up the queries, but it might not. Reorganizing partitioned tables isn't something you should do on a hunch. Do you have a QA system you can use to time the queries and see if they get faster? And wouldn't you want faster queries for the newer data at least as much as for the older data?

Best Answer

Related Solutions

SQL Server 2008 R2 – Best Practices for Selecting Partition Boundary Ranges

Postgresql – How to merge partitions in Postgres

Related Question