Sql-server – SQL Server Partitioning vs Indexes on Separate Filegroups

sql server

I am just about to start on an excercise to reorganise a partitioned table and I was looking for best practises or recommendations.

Let us say I have a billion row table and 8 logical disks available to me for this table.

The question I have is, is it better to create a partition scheme which divides the data into 8 filegroups and has the indexes storage aligned with the data…

Or would it be better to create 4 file groups for data and 4 for indexes (not aligned), then place each of the filegroups onto one logical disk?

Any suggestion or comments would be welcome.

Best Answer

First and foremost: those logical disks better be backed by at least 8 different physical disks. If you're asking about load balancing 8 logical disks created on the same physical storage (the same spindles) then you're wasting time.

The best (and simplest!) option is to create a single filegroup with 8 files (equal in size and pregrown), each on a spindle, and then place the table and the indexes in this filegroup. SQL Server will balance the data equally among the files.

Partitioning is a feature for ETL switch in and switch out. It should not be used for performance, as the best you can hope for is equal performance with the original table. For performance use a well designed clustered index, one that matches the typical load.

If your data is really known upfront and the index usage characteristics are very well understood then you may try to balance them explicitly on their own filegroups. But trying to wrestle manual control over this is more likely to cause harm than benefit. The simpler option of single file group with 8 files balances IO better than manual explicit control 99% of the times.

Related Solutions

Sql-server – Table Partitioning. What is the correct process for deleting .ndf and .ldf files

You can't delete the LDF file: it's essential.

For the NDF, you need to show that it's not used to SQL Server. You'd use DBCC SHRINKFILE with EMPTYFILE. Substitute xxx based on sys.database_files

DBCC SHRINKFILE (xx, EMPTYFILE)

Sql-server – SQL Server 2008 – Partitioning and Clustered Indexes

A partitioned table is really more like a collection of individual tables stitched together. So your in example of clustering by IncidentKey and partition by IncidentDate, say that the partitioning function splits the tables into two partitions so that 1/1/2010 is in partition 1 and 7/1/2010 is partition two. The data will be layed out on disk as:

Partition 1:
IncidentKey    Date
ABC123        1/1/2010
ABC123        1/1/2011
XYZ999        1/1/2010

Partition 2:
IncidentKey    Date
ABC123        7/1/2010
XYZ999        7/1/2010

At a low level there really are two, distinct rowsets. Is the query processor that gives the illusion of a single table by creating plans that seek, scan and update all rowsets together, as one.

Any row in any non-clustered index will have have the clustered index key to which it corresponds, say ABC123,7/1/2010. Since the clustered index key always contains the partitioning key column, the engine will always know in what partition (rowset) of the clustered index to search for this value (in this case, in partition 2).

Now whenever you're dealing with partitioning you must consider if your NC indexes will be aligned (NC index is partitioned exactly the same as the clustered index) or non-aligned (NC index is non-partitioned, or partitioned differently from clustered index). Non-aligned indexes are more flexible, but they have some drawbacks:

non-aligned indexes require large amounts of memory for certain query plans
non-aligned indexes prevent efficient partition switch operations

Using aligned indexes solves these issues, but brings its own set of problems, because this physical, storage design, option ripples into the data model:

aligned indexes mean unique constrains can no longer be created/enforced (except for the partitioning column)
all foreign keys referencing the partitioned table must include the partitioning key in the relation (since the partitioning key is, due to alignment, in every index), and this in turn requires that all tables referencing the partitioned table contain partitioning key column value. Think Orders->OrderDetails, if Orders have OrderID but is partitioned by OrderDate, then OrderDetails must contain not only OrderID, but also OrderDate, in order to properly declare the foreign key constraint.

These effects I found seldom called out at the beginning of a project that deploys partitioning, but they exists and have serious consequences.

If you think aligned indexes are a rare or extreme case, then consider this: in many cases the cornerstone of ETL and partitioning solutions is the fast switch in of staging tables. Switch in operations require aligned indexes.

Oh, one more thing: all my argument about foreign keys and the ripple effect of adding the partitioning column value to other tables applies equally to joins.

Best Answer

Related Solutions

Sql-server – Table Partitioning. What is the correct process for deleting .ndf and .ldf files

Sql-server – SQL Server 2008 – Partitioning and Clustered Indexes

Related Question