Sql-server – the best choose for PK in Partitioned Table

computed-columnpartitioningprimary-keysql serversql-server-2019

I have one large table that is partitioned (table name: Trans). At the moment this table is to be created on 32 partitions. This table will contain approximately 300 million records and data older than 14 days will be deleted daily. One of the columns in this table is a reference to a table that will contain up to 5 million records (table name: Sens) and I also want it to be partitioned. I would like to ask you about:

  1. Will it be a problem that both tables will use the same partitioning function? So the Sens table would also be distributed over 32 partitions and would be save on the same files as the Trans table. Is this a good approach ?

  2. The Trans table has a PK based on two columns TranID (Identity (1,1)) and ParititionID. At the moment, FK to a smaller table ('Sens') is based on only one column – SenID. The smaller table also has to be partitioned. What will be the difference in the approach / efficiency / speed of operation if the PK in the Sens table will only be on the IDENTITY (1,1) column instead of the IDENTITY (1,1) column and the partition column, i.e.

ALTER TABLE [dbo].[Sen]
ADD CONSTRAINT [PK_SenID]
    PRIMARY KEY CLUSTERED ([SenID] ASC) ON [PRIMARY];

-- or 

ALTER TABLE [dbo].[Sen]
ADD CONSTRAINT [PK_SenID]
    PRIMARY KEY CLUSTERED (
                              [SenID] ASC,
                              [PartitionID]
                          ) ON [psTrans]([PartitionID])
  1. Have you ever try to have partition column which is computed ? I am thinking about choose partition according to new column which is computed base on other column in table:

CAST(HASHBYTES('MD5', [othercolumnInTable]) AS tinyint) % 32

Best Answer

For the first question, there's no issue with using the same function for two tables as long as the definition of the partition function doesn't ever need to change. You mentioned deleting daily data and your partition function is planned to contain 32 partitions, so I assume that you're creating one partition per day of the month. If so, I can't see a reason why you would need to merge or split partitions. With that said, given that you just have two tables involved there isn't anything wrong with creating two separate functions as well.

Whether or not the two tables should exist on the same database files depends on how you're defining your files and filegroups. Given that you're only keeping 14 days of data and you purge the rest, I assume that you aren't putting some partitions on different classes of storage. 5 million rows in the Sens table really isn't a large number, so it's likely true that there will be no issue with using the same database files with no tables. This is just a guess based on incomplete information.

For the second question, evaluate whether or not a partition-aligned index is the right fit for the table:

An index that is built on the same partition scheme as its corresponding table. When a table and its indexes are in alignment, SQL Server can switch partitions quickly and efficiently while maintaining the partition structure of both the table and its indexes. An index does not have to participate in the same named partition function to be aligned with its base table. However, the partition function of the index and the base table must be essentially the same, in that:

  1. The arguments of the partition functions have the same data type.
  2. They define the same number of partitions.
  3. They define the same boundary values for partitions.

My own opinion is that you shouldn't partition a table with 5 million rows unless you have a very good reason. Partitioning is a feature designed for large tables. Why do you need to partition a table with 5 million rows?

For the final question, I have worked with partitioned tables that have computed partition columns. I do not recommend it unless you have no other choice. We consistently ran into odd issues including partition elimination not working as expected.