Sql-server – the appropriate index architecture when forced to implement IsDeleted (soft deletes)

indexsql serversql-server-2012

Currently, we have an existing database and application that is fully functional. I do not have the ability to change the architecture at this point. Today, each table in the database has an "IsDeleted" NOT NULL BIT field with a default of '0'. When the application "deletes" data, it simply updates the IsDeleted flag to 1.

What I'm having trouble understanding is how the indexes on each of the tables should be structured. Right now, every query/join/etc always implements the IsDeleted check. It's a standard that our developers must follow. That being said, I'm trying to determine if all of my clustered primary key indexes on each of the tables need to be altered to include the primary key AND the IsDeleted BIT field. Also, since EVERY query/join/etc. must implement the IsDeleted check, is it an appropriate assumption that EVERY SINGLE index (non-clustered as well) should include the IsDeleted field as the first field of the index?

One other question I have is around filtered indexes. I understand that I could put filters on the indexes such as "WHERE IsDeleted = 0" to reduce the size of the indexes. However, since every join/query will have to implement the IsDeleted check, would that prevent the filtered index from being used (since the IsDeleted column is used in join/query)?

Remember, I do not have the ability to change the IsDeleted approach.

Best Answer

The easiest approach here is to leave your keys and clustered indexes alone, and use filtered indexes for your non-clustered indexes.

Additionally you can migrate some large tables to partitioned heaps or partitioned clustered-columnstores (SQL Server 2016+), leaving the primary key and unique indexes un-partitioned. This would enable you to push the non-key columns for IsDeleted rows to a seperate data structure, which could additionally be compressed differently or stored on a different filegroup.

And make sure the developers use a literal instead of a parameter to filter the IsDeleted rows out. With a parameter SQL Server has to use the same query plan for both cases.

EG

SELECT ... WHERE ... AND IsDeleted=0

And not:

SELECT ... WHERE ... AND IsDeleted=@IsDeleted

Using a paramter will prevent use of filtered index, and can get you into trouble with parameter sniffing.