Sql-server – Modifying a non managed table partition

sql serversql-server-2012

I ran across a partition that was initially setup (sql 2012) beginning of 2014 to break by month but wasn't updated for 2015. The main issue is that the last partition was setup as Jan 2015 so Feb data is loading into it and I'd like to add future months to the partition and break out the Feb data out of the Jan 2015 partition. My first thought was to load the data to another table and truncate the existing table then drop and recreate the partition as necessary but the table has over a billion records so storage and time are a consideration. What other options would have I have separate the Feb data from the Jan 2015 partition as well as add future months with the least impact to the system?

Here's the existing partitioning function being used.

CREATE PARTITION FUNCTION [YearMonthPartFunction](date) 
    AS RANGE RIGHT 
    FOR VALUES (N'2014-01-01T00:00:00.000', 
                N'2014-02-01T00:00:00.000', 
                N'2014-03-01T00:00:00.000', 
                N'2014-04-01T00:00:00.000', 
                N'2014-05-01T00:00:00.000', 
                N'2014-06-01T00:00:00.000', 
                N'2014-07-01T00:00:00.000', 
                N'2014-08-01T00:00:00.000', 
                N'2014-09-01T00:00:00.000', 
                N'2014-10-01T00:00:00.000', 
                N'2014-11-01T00:00:00.000', 
                N'2014-12-01T00:00:00.000', 
                N'2015-01-01T00:00:00.000')

Best Answer

The best way to do this is:

Let us call your original table Original.

Create two new tables Jan and Feb) matching the schema of Original. You might want to use the partition management tool for this: https://sqlpartitionmgmt.codeplex.com/. Place these new tables directly on the filegroup you want to end up in. Do NOT place them on a partition scheme.
Add a check constraint to Jan limiting the data to only the data in January 2015
Do the same for Feb, but limiting data to February
Script and drop index on Jan and Feb
Use INSERT Jan WITH (TABLOCK)... SELECT Original WHERE Month = 'January' to populate Jan. This is bulk logged, and hence, should run at around 40-80MB/sec with minimal impact on the transaction log
As above, for February.
Add indexes back on Jan and Feb
Switch out the old January/February partition from Original to an empty table. Original now has no 2015 records (validate this)
SPLIT the January partition function of Original on the February/January boundary (and any other months you want to prepare for)
ALTER TABLE Jan SWITCH TO Original PARTITION x where x is the partition number of January
ALTER TABLE Feb SWITCH TO Original PARTITION y, where y is the partition number of February
Profit

This is the fastest way to achieve the result with minimal impact on the disk system and transaction log.

It is rather complicated, but learning Partition Fu tricks like these are key to getting the best out of SQL Server partitioning.

Related Solutions

Sql-server – Expanding a partitioned table

Here are two best practices for partitioning that pertain to the question:

Keep an empty staging partition at the leftmost and rightmost ends of the partition range to ensure that the partitions split when loading in new data, and merge, after unloading old data, do not cause data movement.
Do not split or merge a partition already populated with data because this can cause severe locking and explosive log growth.

http://www.informit.com/articles/article.aspx?p=1946159&seqNum=5

If the leftmost end of your partition is empty, use ALTER PARTITION FUNCTION SPLIT RANGE to add new ranges to the partition function.

To check if the leftmost partition is empty, use a query like the following:

DECLARE @PartitionFunctionName sysname = 'YourPartitionFunctionNameHere';

SELECT 
p.partition_number, SUM(pst.row_count) RowCountInPartition, pf.name PartitionFunction, ps.name PartitionScheme
FROM sys.dm_db_partition_stats pst
INNER JOIN sys.partitions p ON pst.partition_id = p.partition_id
INNER JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id
INNER JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
INNER JOIN sys.partition_functions pf ON ps.function_id = pf.function_id
WHERE pf.name = @PartitionFunctionName
GROUP BY p.partition_number, pf.name, ps.name;

If the first partition is not empty, the best practices recommend that you create a new function with all values, create a new table on that function, then insert the data to the new table.

Also, if the left partition just has a few records, a split may be fine. Not sure on that as I've never tried it.

Whatever you do, make sure to leave some empty partitions at the leftmost and rightmost partition when you're finished. I might even go so far as to creating partition ranges for 0 and 1, then add a check constraint to prevent the first partition from getting data in it. Do the same thing for the end.

SQL Server 2008 R2 – Best Practices for Selecting Partition Boundary Ranges

Let me start by mentioning that these 4 boundaries result in following 5 partitions. Partitions 1 and 5 are unbounded one one side and partitions 2, 3, and 4 are constrained to quarterly data:

partition 1: values less than '2014-04-01'
partition 2: values >= '2014-04-01' and < '2014-07-01'
partition 3: values >= '2014-07-01' and < '2014-10-01'
partition 4: values >= '2014-10-01' and < '2015-01-01'
partition 5: values >= '2015-01-01'

The rationale for keeping (at least) the first and last partitions empty is to ensure expensive data movement is not required during partition maintenance of an incremental or sliding window scenario. If data must be moved to a different partition to accommodate new boundaries during MERGE or SPLIT, the logging is about 4 times greater than normal DDL operations. This is especially nasty with large tables, as is common with table partitioning.

Data movement is never required when an existing empty partition is split or when 2 adjacent empty partitions are merged. This is why the best practice is to plan for such, making the maintenance a no-brainer that can easily be automated.

That said, one can split a non-empty partition without overhead in some cases. For example, you could split the function to add the new '2015-04-01' boundary without data movement as long no existing rows are >= '2015-04-01'. But to also avoid an expensive scan during the offline operation, you need an index with the partitioning column as the high-order key column so that SQL Server can efficiently validate no data exists beyond the new boundary. So in your case, you could probably split the last partition in advance of the start of the next quarter without incurring overhead if the partitioning column is indexed (a common design pattern).

Assuming your objective is 4 quarterly partitions of historical data plus the current active quarter in a sliding window, I suggest bounded partitions for each of these 5 quarters plus another bounded one for the future quarter. This will result in 8 partitions when you include the unbounded first and last ones.

CREATE PARTITION FUNCTION PfDateRange(date)
AS RANGE RIGHT FOR VALUES(
      '2014-01-01'
    , '2014-04-01'
    , '2014-07-01'
    , '2014-10-01'
    , '2015-01-01'
    , '2015-04-01'
    , '2015-07-01');

The reason one might want additional future empty partitions is to provide time to react in case partition maintenance is missed for some reason. Multiple future partitions are more important with more frequent maintenance (e.g. daily), and less so for quarterly maintenance.

After the start of the '2015-04-01' quarter, the partition maintenance script can:

switch '2014-01-01' to a and empty staging table and truncate staging
merge the '2014-01-01' partition to remove it
spilt at '2015-10-01' for the next future quarter

You may want to do a sanity check before the maintenance in case you have bad data outside of the expected range (less than '2014-01-01' or greater than '2015-07-01'). Alternatively, maintain a check constraint for the expected range so that only valid data can be inserted.

Best Answer

Related Solutions

Sql-server – Expanding a partitioned table

SQL Server 2008 R2 – Best Practices for Selecting Partition Boundary Ranges

Related Question