Sql-server – Table Partitioning and updates to partition subset

partitioningsql server

I am partitioning a large table (2 Billion records ) on an integer say AssetID that has 70,000 unique values, due to partitioning limitation of 15,000 I will create a partition on say 10,000 values using ranges. Each asset will have several thousands of records.

NUTS

I have a staging table that updates a subset of assets every day. And currently, we are deleting assets and re-inserting them.

If I was able to partition by Asset ID then I could simply swap the partition, but since I am partitioning by asset range it gets a bit more complicated.

Each day I update approximately 2% of the data. Updates can happen on any asset in the dataset.

What is a recommended approach to updating a subset of data in the partition?
Say that I have an update for an asset in patition 10.

Do I have to switch out the partition 10 to a new staging table and perform updates there and then switch it back in?
What would the advantage of this be as compared to updating data directly?

Best Answer

due to partitioning limitation of 15,000 I will create a partition on say 10,000 values using ranges.

If by this you mean that you plan to create a partition function with 10000 ranges, then I strongly recommend against doing that. The limit of 15000 is not a goal. The product supports going over 1000 now but the engine support for that isn't great. You should expect to see query performance and other issues if you work with a table with so many partitions.

If I was able to partition by Asset ID then I could simply swap the partition, but since I am partitioning by asset range it gets a bit more complicated.

Partitioning a table isn't something to take lightly, especially if you need to convert a table that you consider to be large to the new partition scheme. It is atypical to partition a table on an integer value like you described. Most of the time people partition by date or by a column that represents a date column. Here, it seems as if you are partitioning the table simply to allow switching out a partition, truncating it, and reinserting all of the data.

What is a recommended approach to updating a subset of data in the partition? Say that I have an update for an asset in patition 10.

Issue a T-SQL UPDATE to change your data. If you know that all of the data to be updated resides in a single partition you can filter by the partition function of that function (using syntax similar to $PARTITION.{{partition_function}}(AssetID) = 10) to avoid an unnecessary DML request sort.

Do I have to switch out the partition 10 to a new staging table and perform updates there and then switch it back in? What would the advantage of this be as compared to updating data directly?

No, you do not need to switch out a partition to a new table just to perform updates. However, there are some advantages in performing DML against single partitions of a properly partitioned table:

  1. If you'd like to perform an operation that requires a table lock on the target, such as parallel inserts in SQL Server 2016, you can do so against many partitions concurrently if you switch the partitions out first into their own tables.
  2. You can avoid blocking issues without needing partition level lock escalation.
  3. The UPDATE operator in a query cannot run in parallel. However, if you switch out partitions then you can safely and easily issue many concurrent UPDATE queries at once.

It's really unclear what you hope to accomplish by partitioning your table. I recommend reading through some of the resources here and defining your goal to be a bit better. There's nothing wrong with partitioning a table to, in addition to other goals, make an ETL type of process more efficient. however, the partition scheme that you pick can have big impacts on maintenance and query performance against the table.