Sql-server – Table Partition Function Empty Partitions

partitioningsql serversql-server-2017

I plan to partition my table monthly along a date column, and have read at few places, that it is recommended (in fact a best practice) to keep empty partitions at both "end" of the range.
https://techcommunity.microsoft.com/t5/premier-field-engineering/oops-i-forgot-to-leave-an-empty-sql-table-partition-how-can-i/ba-p/370563

As a naive approach I would create my partition function like this:

CREATE PARTITION FUNCTION MyPf(DATE)
AS RANGE RIGHT FOR VALUES (
'1900-01-01',
'2019-10-01','2019-11-01',...,...,...,'2022-08-01',
'9999-12-31');

CREATE PARTITION SCHEME MyPs AS PARTITION MyPf
ALL TO (MySingleFileGroup)

I can guarantee that no data older than 2019-10-01 will be inserted to the table, and I plan to keep SPLIT partitions once I reached 2022-08-01, and keep doing so up until 9999-12-31.
I also plan to regularly TRUNCATE old partitions and MERGE the old partitions range.

Did I miss anything obvious regarding the best practice with this setup? My only goal is to be able to split and merge without moving data around.

Thank you!

Best Answer

I can guarantee that no data older than 2019-10-01 will be inserted to the table, and I plan to keep SPLIT partitions when I am reaching 2022-08-01

As long as no existing data qualifies for the new partition boundary range, no data movement is required for the SPLIT.

I also plan to regularly TRUNCATE old partitions and MERGE the old partitions range.

Similarly, the merged partition will be empty due to the preceding TRUNCATE so no rows will need be moved to accommodate the new boundaries.

Related Solutions

Sql-server – Partitioning in SQL Server 2008

For what you want to do, I would recommend the following (which is pretty much what you were thinking).

1> Create history tables for the historic data you have - keep the schemas as similar as possible. Split up by some logical grouping (such as year/month) based on how they are going to be queried (say you need to report with in month/year as well as all). Do not worry about the table size of the splits unless they are getting into the TB size range (your dbms should handle it) just make sure that they are appropriately indexed for the queries that need to be run. You should consider putting these onto a different disk to the active data if performance is an issue.

2> Create a routine to move data from the active table to the relevant historic table. Run this periodically. As a practice rebuild the indexes on the table that has had the data removed from it, and maybe update the table statistics. Easiest way to do this is to write a sql script.

3> Consider the reporting you want to do. If you want to only have to deal with 1 table when writing queries, create a view that joins the archived tables together. Create indexes on all the tables to suite the view. This way if you want all the data, select from the view. If you want data from a specific year/month, query that table. The view will look something like:

create view view_all_data as select "Jan12" as month,a.* from data_Jan12 a union select "Feb12" as month,b.* from data_Feb12 b ....

I am assuming here that the system is not a highly used transactional system and that you have windows of low usage to run the analysis queries. If you need to maintain high levels of performance, you may like to do the above in a separate database (separate hardware) and port across the new data that you get from backups.

Sql-server – Best practices for selecting partition boundary ranges to maintain empty partitions at both end

Let me start by mentioning that these 4 boundaries result in following 5 partitions. Partitions 1 and 5 are unbounded one one side and partitions 2, 3, and 4 are constrained to quarterly data:

partition 1: values less than '2014-04-01'
partition 2: values >= '2014-04-01' and < '2014-07-01'
partition 3: values >= '2014-07-01' and < '2014-10-01'
partition 4: values >= '2014-10-01' and < '2015-01-01'
partition 5: values >= '2015-01-01'

The rationale for keeping (at least) the first and last partitions empty is to ensure expensive data movement is not required during partition maintenance of an incremental or sliding window scenario. If data must be moved to a different partition to accommodate new boundaries during MERGE or SPLIT, the logging is about 4 times greater than normal DDL operations. This is especially nasty with large tables, as is common with table partitioning.

Data movement is never required when an existing empty partition is split or when 2 adjacent empty partitions are merged. This is why the best practice is to plan for such, making the maintenance a no-brainer that can easily be automated.

That said, one can split a non-empty partition without overhead in some cases. For example, you could split the function to add the new '2015-04-01' boundary without data movement as long no existing rows are >= '2015-04-01'. But to also avoid an expensive scan during the offline operation, you need an index with the partitioning column as the high-order key column so that SQL Server can efficiently validate no data exists beyond the new boundary. So in your case, you could probably split the last partition in advance of the start of the next quarter without incurring overhead if the partitioning column is indexed (a common design pattern).

Assuming your objective is 4 quarterly partitions of historical data plus the current active quarter in a sliding window, I suggest bounded partitions for each of these 5 quarters plus another bounded one for the future quarter. This will result in 8 partitions when you include the unbounded first and last ones.

CREATE PARTITION FUNCTION PfDateRange(date)
AS RANGE RIGHT FOR VALUES(
      '2014-01-01'
    , '2014-04-01'
    , '2014-07-01'
    , '2014-10-01'
    , '2015-01-01'
    , '2015-04-01'
    , '2015-07-01');

The reason one might want additional future empty partitions is to provide time to react in case partition maintenance is missed for some reason. Multiple future partitions are more important with more frequent maintenance (e.g. daily), and less so for quarterly maintenance.

After the start of the '2015-04-01' quarter, the partition maintenance script can:

switch '2014-01-01' to a and empty staging table and truncate staging
merge the '2014-01-01' partition to remove it
spilt at '2015-10-01' for the next future quarter

You may want to do a sanity check before the maintenance in case you have bad data outside of the expected range (less than '2014-01-01' or greater than '2015-07-01'). Alternatively, maintain a check constraint for the expected range so that only valid data can be inserted.

Best Answer

Related Solutions

Sql-server – Partitioning in SQL Server 2008

Sql-server – Best practices for selecting partition boundary ranges to maintain empty partitions at both end

Related Question