Sql-server – Expanding a partitioned table

partitioningsql serversql-server-2012

I have a table that holds four years of data, where for each month I have a partition that is based on a filegroup with two files. Unexpectedly, I received eight more years of data and prepared the database files and file groups for these years.

The current scheme in place covers 200901 – 201512 and I now need to add 200001 – 200812. What is the best way to bring this online for the partitioned table?

  • Create a new partition scheme and partition function and change the clustered index to this new partition scheme; or
  • Can I somehow alter the existing partition scheme and partition function?

I'm asking because the current table has 18 billion rows and I want to do this the efficient way 🙂

Best Answer

Here are two best practices for partitioning that pertain to the question:

  1. Keep an empty staging partition at the leftmost and rightmost ends of the partition range to ensure that the partitions split when loading in new data, and merge, after unloading old data, do not cause data movement.
  2. Do not split or merge a partition already populated with data because this can cause severe locking and explosive log growth.

http://www.informit.com/articles/article.aspx?p=1946159&seqNum=5

If the leftmost end of your partition is empty, use ALTER PARTITION FUNCTION SPLIT RANGE to add new ranges to the partition function.

To check if the leftmost partition is empty, use a query like the following:

DECLARE @PartitionFunctionName sysname = 'YourPartitionFunctionNameHere';

SELECT 
p.partition_number, SUM(pst.row_count) RowCountInPartition, pf.name PartitionFunction, ps.name PartitionScheme
FROM sys.dm_db_partition_stats pst
INNER JOIN sys.partitions p ON pst.partition_id = p.partition_id
INNER JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id
INNER JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
INNER JOIN sys.partition_functions pf ON ps.function_id = pf.function_id
WHERE pf.name = @PartitionFunctionName
GROUP BY p.partition_number, pf.name, ps.name;

If the first partition is not empty, the best practices recommend that you create a new function with all values, create a new table on that function, then insert the data to the new table.

Also, if the left partition just has a few records, a split may be fine. Not sure on that as I've never tried it.

Whatever you do, make sure to leave some empty partitions at the leftmost and rightmost partition when you're finished. I might even go so far as to creating partition ranges for 0 and 1, then add a check constraint to prevent the first partition from getting data in it. Do the same thing for the end.