SQL Server – Tuning Clustered Indexes for Soft or Logical Deletes

clustered-indexindexsql server

I have a table (say dbo.AppUsers) with a clustered index on the always increasing, application generated column AppUserId.

I have also a Bit column IsDisabled, that is used to logically delete/disable users.

Now, the majority of the queries on the table will include a Where IsDisabled = 0 clause, which is fine.

I'm wondering if I should include the IsDisabled column in the clustered index or if there is a best practice in this scenario.

My thoughts on this:

I should have IsDisabled in the clustered index, because list queries should not read the logically deleted rows from disk.
or…
I should not have IsDisabled in the clustered index, because it could mess up with table fragmentation.

Is there some sort of trade off?

Thanks!

Best Answer

First, testing is your buddy. Run tests on this.

Let's avoid talking too much opinion. This question could be taken as a purely opinion driven one. Instead, let's talk about some of the mechanics of choosing a clustered index.

You get one clustered index on a table because the clustered index defines data storage. Since the clustered index is where the data gets stored, the best rule I've found is to make the clustered index the most frequently used path to the data. For many tables, this is simple, the primary key. It will absolutely be the most commonly used path to the data. However, that's not always the case.

The question here is, how is your data being accessed? Is it through the primary key? If so, adding the IsDisabled shouldn't be necessary. If most queries have the ID value and you're going directly to one row, that's it. Adding IsDisabled won't help. In fact, having a filter that has the ID and IsDisabled will work fine (although, test it, test it, test it) because the principal filter is on a unique value that can be found first and then additional columns will be used to refine that result.

On the other hand, if the primary key is not the most common access path to the data. Rather, you're seeing filtering on date, some foreign key, a set of columns, and then the IsDisabled column, then changing the clustered index may make some sense. Instead of having it on the default of the primary key, put it on the data, fk, or set of columns, along with IsDisabled. Although, again, test it.

Picking the keys for an index, clustered or not, is all about balancing selectivity, so that the statistics which are always built off the first column look as attractive to the optimizer as possible, with the need to define the key so it functions to properly refine the data. The IsDisabled column, at least early in the databases life, is likely going to have only a single value. However, as the database ages, it's going to have two. But only, ever two. So, regardless of whether or not you put that column in the index, it's unlikely to be useful as the leading edge, or first column, of that index. It's always going to be either a second, or last, column, or an INCLUDE column.

I hope all that helps.

Be sure and test it.