Sql-server – Is it possible to have partition in SQL Server 2016 with both clustered and non-clustered indexes on different partitions

clustered-indexnonclustered-indexpartitioningsql serversql-server-2016

Suppose I have an equities pricing table with a 1000 million rows. I need to be able to maintain consistency, read and write to it at high speed with a large number of clients.

What I'd like is a partition that splits the table into two. An 'archive partition' that has a clustered index and has the older (by time) 90% of the data. I'd also like a 'recent partition' that contains the most recent 10% with a non-clustered index.

I'm not sure if this is possible.

My question is: Is it possible to have partition in SQL Server 2016 with both clustered and non-clustered indexes on different partitions?

Best Answer

I don't think it's possible. The notion seems to fly in the face of everything that we know about partitioning and clustered indexes, but I might be wrong.

One possibility is a partitioned view over two tables, one with a clustered index, one as a heap. I have included a demo script for this, but it does come with a number of limitations which might be critical:

inserts seem to need to explicitly insert into each column
the heap table still needs a primary key which includes the partitioning column (even if the primary key is not clustered)
moving data from 'recent' to 'archive' involves moving from one table to another

Overall your situation is probably complex enough that it needs a much more considered analysis.

Use tempdb;

Create Table TestOld (
    id int,
    partition_column char(1) Constraint chk_partition_old Check (partition_column = 'O'),
    other_column char(100),
    Constraint pTestOld Primary Key Clustered (partition_column, id));

Create Table TestNew (
    id int,
    partition_column char(1) Constraint chk_partition_new Check (partition_column = 'N'),
    other_column char(100),
    Constraint pTestNew Primary Key Nonclustered (partition_column, id));
Go

Create View Test As
Select id, partition_column, other_column
  From TestOld
Union All
Select id, partition_column, other_column
  From TestNew;  
Go

Insert Into Test (id, partition_column, other_column)
Values  (1, 'O', ''),
        (2, 'N', '');

Select *
  From Test
  Where partition_column = 'N';

A second option which may have more merit in your case is a filtered index which you could add over the old part of your table and include a large number of columns so that it covers the queries that you do on the old part of the table (this would serve to speed up reads to the old part of the table but wouldn't slow writes in the recent part).

Related Solutions

Sql-server – SQL Server 2008 – Partitioning and Clustered Indexes

A partitioned table is really more like a collection of individual tables stitched together. So your in example of clustering by IncidentKey and partition by IncidentDate, say that the partitioning function splits the tables into two partitions so that 1/1/2010 is in partition 1 and 7/1/2010 is partition two. The data will be layed out on disk as:

Partition 1:
IncidentKey    Date
ABC123        1/1/2010
ABC123        1/1/2011
XYZ999        1/1/2010

Partition 2:
IncidentKey    Date
ABC123        7/1/2010
XYZ999        7/1/2010

At a low level there really are two, distinct rowsets. Is the query processor that gives the illusion of a single table by creating plans that seek, scan and update all rowsets together, as one.

Any row in any non-clustered index will have have the clustered index key to which it corresponds, say ABC123,7/1/2010. Since the clustered index key always contains the partitioning key column, the engine will always know in what partition (rowset) of the clustered index to search for this value (in this case, in partition 2).

Now whenever you're dealing with partitioning you must consider if your NC indexes will be aligned (NC index is partitioned exactly the same as the clustered index) or non-aligned (NC index is non-partitioned, or partitioned differently from clustered index). Non-aligned indexes are more flexible, but they have some drawbacks:

non-aligned indexes require large amounts of memory for certain query plans
non-aligned indexes prevent efficient partition switch operations

Using aligned indexes solves these issues, but brings its own set of problems, because this physical, storage design, option ripples into the data model:

aligned indexes mean unique constrains can no longer be created/enforced (except for the partitioning column)
all foreign keys referencing the partitioned table must include the partitioning key in the relation (since the partitioning key is, due to alignment, in every index), and this in turn requires that all tables referencing the partitioned table contain partitioning key column value. Think Orders->OrderDetails, if Orders have OrderID but is partitioned by OrderDate, then OrderDetails must contain not only OrderID, but also OrderDate, in order to properly declare the foreign key constraint.

These effects I found seldom called out at the beginning of a project that deploys partitioning, but they exists and have serious consequences.

If you think aligned indexes are a rare or extreme case, then consider this: in many cases the cornerstone of ETL and partitioning solutions is the fast switch in of staging tables. Switch in operations require aligned indexes.

Oh, one more thing: all my argument about foreign keys and the ripple effect of adding the partitioning column value to other tables applies equally to joins.

Sql-server – Why SQL Server uses Non-Clustered but not Clustered Index

The optimizer has a choice between two main strategies:

Scan the table (the clustered index) checking every row to see if LoanNum = 2712.
Scan & Lookup
- Scan the nonclustered index to find rows where LoanNum = 2712
- Look up the column data for the matched rows not covered by the nonclustered index.

The key point is that the nonclustered index is smaller, so scanning it is expected to be cheaper. This might seem counterintuitive because the clustered index definition has the same keys, and the nonclustered index has included columns, but the point is that the clustered index includes all columns stored in-row - the leaf (lowest) level of the clustered index literally is the in-row data.

For a small number of expected matches, the cost saving from scanning the smaller index is more than enough to compensate for the key lookups.

Incidentally, you may find that removing the WHERE 1 = 1 from your query causes the optimizer to choose the clustered index scan. The (redundant) constant-to-constant comparison prevents SQL Server parameterizing the query, so estimates are based on statistical information about LoanNum 2712. If the query is parameterized, SQL Server will use the average distribution of LoanNum values, which might result in a higher number of expected rows, and a change of plan choice.

Best Answer

Related Solutions

Sql-server – SQL Server 2008 – Partitioning and Clustered Indexes

Sql-server – Why SQL Server uses Non-Clustered but not Clustered Index

Related Question