Sql-server – Table partition existing table where partition key is not part of the primary key

partitioningsql serversql-server-2017

Main objective: Add partitioning to table to make deletion of old orders non-blocking/quicker (and also understand partitioning)

I have an existing table Order, like this:

CREATE TABLE Order (
    OrderId INT,
    OrderDate Datetime,
    Quantity INT,
    CONSTRAINT [PK_OrderId] PRIMARY KEY CLUSTERED 
(
    [OrderId] ASC
) 
ON [PRIMARY];

This table contains 50 million rows from the last 10 years.
I only need the last 5 years data.

I have a partition function like this:

CREATE PARTITION FUNCTION OrderPF (datetime)
AS RANGE RIGHT FOR VALUES ('2014-01-01')

I have a partition scheme like this:

CREATE PARTITION SCHEME OrderPS 
AS PARTITION OrderPF ALL TO ([PRIMARY])

My question is how to proceed?
I still want a primary key on the table.

Does the [OrderDate] column have to be a part of the clustered index? (Main question)

CREATE UNIQUE CLUSTERED INDEX IX_Order ON Order(OrderDate,OrderId) ON OrderPS(OrderDate) ;

If so, do I then have to create an extra non-clustered Primary Key purely on [OrderId]?

ALTER TABLE Order ADD CONSTRAINT PK_OrderId PRIMARY KEY NONCLUSTERED (Id) ON [PRIMARY];

Is this the correct approach?

Best Answer

If OrderId is monotonically increasing, you can partition on that. Then you can truncate old partitions having no data you need to retain. Something like:

create partition function pf_OrderId(int) 
as range right for values (0,1000000,2000000,3000000,4000000,5000000,6000000,7000000,8000000,9000000)

create partition scheme ps_OrderId
as partition pf_OrderId all to ([Primary])

go


CREATE TABLE [Order] (
    OrderId INT,
    OrderDate Datetime,
    Quantity INT,
    constraint [PK_OrderId] primary key clustered (OrderId)
) 
ON ps_OrderId(OrderId)

go

--then you can examine the max OrderDate in each partition when trimming old data
select p.partition_number,
       (select max(OrderDate) MaxOrderDate 
        from [Order] 
        where $PARTITION.pf_OrderId(OrderId) = p.partition_number) MaxOrderDate 
from sys.partitions p
where p.object_id = object_id('Order')
and p.index_id = 1

And of course you can adjust the granularity of your partitions to roughly align to your data retention requirements. And if you have a hard requirement to purge old data then you would truncate N partitions and run a DELETE on at most one partition. And you can always split the partition function to insert a partition boundary at important times, like overnight at the beginning of a year or quarter.

To move an existing table to a partition scheme, you drop all the indexes and the clustered primary key constraint, and recreate them on the new partition scheme. Once you create the clustered index on the partition scheme, subsequently-created indexes will go there by default. If you don't drop the non-clustered indexes first, they will be rebuilt when you drop the clustered PK, and rebuilt again when you recreate it, and they still won't be partitioned. EG

CREATE TABLE [Order] (
    OrderId INT,
    OrderDate Datetime,
    Quantity INT,
    constraint [PK_OrderId] primary key clustered (OrderId),
    index ix_Order_Orderdate (OrderDate)
) 

go

create partition function pf_OrderId(int) 
as range right for values (0,1000000,2000000,3000000,4000000,5000000,6000000,7000000,8000000,9000000)

create partition scheme ps_OrderId
as partition pf_OrderId all to ([Primary])

go

drop index ix_Order_Orderdate on [Order]

alter table [Order] 
drop constraint [PK_OrderId] 

alter table [Order] 
add constraint [PK_OrderId] primary key clustered (OrderId)
on ps_OrderId(OrderId)

create index ix_Order_Orderdate on [Order](OrderDate)

Then verify that both the clustered and non-clustered indexes are partitioned:

select i.name index_name, p.partition_number 
from sys.partitions p
join sys.indexes i 
 on p.object_id = i.object_id
 and p.index_id = i.index_id 
where p.object_id = object_id('Order')

Related Solutions

Sql-server – Does the partition key also have to be part of the primary key

Not at all.

One of the most common scenarios for partitioning is to use a date field, which is totally unrelated to your PK.

For instance, if you have a table Orders with the field OrderDate you would most likely partition based on the month and year of OrderDate.

When records age out and are no longer relevant you can move those partitions off to an archive table or database so they are no longer processed.

Partitioning will work with pretty much any field, but in order for it to work WELL the field(s) you partition on should be used in most, if not all, of your queries. If you don't include your partition keys then you will get essentially an expensive table scan that goes across multiple tables (partitions).

EDIT

For part 2, I think the answer is no as well. The partition key is used to determine which partition to put the row in, but I don't think an index is maintained. There may be stats in the back end on it though.

Sql-server – Partition Key questions in SQL Server 2008

Assuming that you have the primary key on a clustered index then the partitioning key needs to be part of the primary key.

You will not loose the benefit of partitioning by joining to non-partitioned tables, providing that the queries are designed to make use of the partitioned table, for example the following query WILL benefit from partitioning

SELECT F.Col1, F.Col2, D.Col3
FROM Fact_Partitioned F
    INNER JOIN Dim_MyDim D ON F.Col1 = D.Col1
WHERE F.Col1 = 5

But the following query WILL NOT benefit from partition elimination

SELECT F.Col1, F.Col2, D.Col3
FROM Fact_Partitioned F
    INNER JOIN Dim_MyDim D ON F.Col1 = D.Col1
WHERE D.Col1 = 5

It is a subtle difference, but in the first query, the join key is filtered in the partitioned table, taking advantage of elimination and then joined to the dimension. In the second query, the key is filtered in the dimension and then joined against the whole of the fact table, rather than just required partitions.

It goes without saying that the partitioning key needs to be in the WHERE clause for elimination to work, otherwise SQL Server does not know which partition(s) the data is in.

Adding a filter criteria on the JOIN clause will not help you. It needs to be in the WHERE clause to benefit from elimination.

The Partition Key does not need to be part of a non-clustered index (NCI) but if the NCI is unique, then it needs to contain the partitioning key in order to align the index. This is where the NCI is built on the same partition scheme as the table. NCIs should also be partition aligned unless there is an exceedingly good reason not to. I have never come across a good enough reason!

Best Answer

Related Solutions

Sql-server – Does the partition key also have to be part of the primary key

Sql-server – Partition Key questions in SQL Server 2008

Related Question