SQL Server – Tuning Clustered Indexes for Soft or Logical Deletes

clustered-indexindexsql server

I have a table (say dbo.AppUsers) with a clustered index on the always increasing, application generated column AppUserId.

I have also a Bit column IsDisabled, that is used to logically delete/disable users.

Now, the majority of the queries on the table will include a Where IsDisabled = 0 clause, which is fine.

I'm wondering if I should include the IsDisabled column in the clustered index or if there is a best practice in this scenario.

My thoughts on this:

I should have IsDisabled in the clustered index, because list queries should not read the logically deleted rows from disk.
or…
I should not have IsDisabled in the clustered index, because it could mess up with table fragmentation.

Is there some sort of trade off?

Thanks!

Best Answer

First, testing is your buddy. Run tests on this.

Let's avoid talking too much opinion. This question could be taken as a purely opinion driven one. Instead, let's talk about some of the mechanics of choosing a clustered index.

You get one clustered index on a table because the clustered index defines data storage. Since the clustered index is where the data gets stored, the best rule I've found is to make the clustered index the most frequently used path to the data. For many tables, this is simple, the primary key. It will absolutely be the most commonly used path to the data. However, that's not always the case.

The question here is, how is your data being accessed? Is it through the primary key? If so, adding the IsDisabled shouldn't be necessary. If most queries have the ID value and you're going directly to one row, that's it. Adding IsDisabled won't help. In fact, having a filter that has the ID and IsDisabled will work fine (although, test it, test it, test it) because the principal filter is on a unique value that can be found first and then additional columns will be used to refine that result.

On the other hand, if the primary key is not the most common access path to the data. Rather, you're seeing filtering on date, some foreign key, a set of columns, and then the IsDisabled column, then changing the clustered index may make some sense. Instead of having it on the default of the primary key, put it on the data, fk, or set of columns, along with IsDisabled. Although, again, test it.

Picking the keys for an index, clustered or not, is all about balancing selectivity, so that the statistics which are always built off the first column look as attractive to the optimizer as possible, with the need to define the key so it functions to properly refine the data. The IsDisabled column, at least early in the databases life, is likely going to have only a single value. However, as the database ages, it's going to have two. But only, ever two. So, regardless of whether or not you put that column in the index, it's unlikely to be useful as the leading edge, or first column, of that index. It's always going to be either a second, or last, column, or an INCLUDE column.

I hope all that helps.

Be sure and test it.

Related Solutions

Sql-server – should we replace some non-clustered indexes with clustered indexes

Your problem will come in when you insert data into the ReportDataCosts table, as it will have to perform splits to insert the data, with a clustered index on the FK.

If you're so hell-bent on a clustered index on the FK, why not widen the PK to include the FK and make that clustered?

SQL Server Partitioning – Partitioning and Clustered Indexes in SQL Server 2008

A partitioned table is really more like a collection of individual tables stitched together. So your in example of clustering by IncidentKey and partition by IncidentDate, say that the partitioning function splits the tables into two partitions so that 1/1/2010 is in partition 1 and 7/1/2010 is partition two. The data will be layed out on disk as:

Partition 1:
IncidentKey    Date
ABC123        1/1/2010
ABC123        1/1/2011
XYZ999        1/1/2010

Partition 2:
IncidentKey    Date
ABC123        7/1/2010
XYZ999        7/1/2010

At a low level there really are two, distinct rowsets. Is the query processor that gives the illusion of a single table by creating plans that seek, scan and update all rowsets together, as one.

Any row in any non-clustered index will have have the clustered index key to which it corresponds, say ABC123,7/1/2010. Since the clustered index key always contains the partitioning key column, the engine will always know in what partition (rowset) of the clustered index to search for this value (in this case, in partition 2).

Now whenever you're dealing with partitioning you must consider if your NC indexes will be aligned (NC index is partitioned exactly the same as the clustered index) or non-aligned (NC index is non-partitioned, or partitioned differently from clustered index). Non-aligned indexes are more flexible, but they have some drawbacks:

non-aligned indexes require large amounts of memory for certain query plans
non-aligned indexes prevent efficient partition switch operations

Using aligned indexes solves these issues, but brings its own set of problems, because this physical, storage design, option ripples into the data model:

aligned indexes mean unique constrains can no longer be created/enforced (except for the partitioning column)
all foreign keys referencing the partitioned table must include the partitioning key in the relation (since the partitioning key is, due to alignment, in every index), and this in turn requires that all tables referencing the partitioned table contain partitioning key column value. Think Orders->OrderDetails, if Orders have OrderID but is partitioned by OrderDate, then OrderDetails must contain not only OrderID, but also OrderDate, in order to properly declare the foreign key constraint.

These effects I found seldom called out at the beginning of a project that deploys partitioning, but they exists and have serious consequences.

If you think aligned indexes are a rare or extreme case, then consider this: in many cases the cornerstone of ETL and partitioning solutions is the fast switch in of staging tables. Switch in operations require aligned indexes.

Oh, one more thing: all my argument about foreign keys and the ripple effect of adding the partitioning column value to other tables applies equally to joins.

Best Answer

Related Solutions

Sql-server – should we replace some non-clustered indexes with clustered indexes

SQL Server Partitioning – Partitioning and Clustered Indexes in SQL Server 2008

Related Question