SQL Server – Benefits of Multiple Filtered Indexes

filtered-indexsql server

I have a SQL Server database and stumbled across a table with a uniqueidentifier primary key named ActivityId and columns StateCode (int) and ActivityTypeCode (int) along with many others. The tables has 13M rows and there are 94 unique values for ActivityTypeCode with row counts ranging from 1 to 4M. OK, here’s the weird part: I have 97 different filtered indexes on (StateCode, ActivityId) where the only difference is the where clause on (ActivityTypeCode=(_some_value_)).

To my eyes, all 97 indexes could be dropped and replaced with one index on just StateCode (since ActivityID is the primary key and is already included as a hidden column). My question is: Is there any reason why I would want to keep multiple filtered indexes? Could they reduce deadlocks somehow? I’ve done some testing and I don’t see any reason to have so many indexes. Thoughts?

Ken

[Edit]
Based on the excellent answer below, I've done some testing and it does not appear that the filtered indexes prevent blocking on even simple queries. My indexes are of the form:

CREATE INDEX IX_StateCode_ActivityID_ActivityTypeCode_Filtered_0 ON dbo.Activities(StateCode,ActivityId) WHERE ActivityTypeCode = 0;

If I go into one database connection and issue the following:

Begin transaction;
Update Activities set StateCode=888 where ActivityTypeCode=1

Then go into a second connection and issue this:

Update Activities set StateCode=999 where ActivityTypeCode=3

The second connection hangs. It seems these different indexes do not prevent blocking so I don't see any reason to keep them. Given that there are indexes for ActivityTypeCode values that do not exist, I suspect they were all generated by some sort of development tool. FYI: This is from a third party vendor.

Thanks everyone for your help.

[Edit #2] Upon detail examiniation, not all indexes are the same. Below are the combinations. Keep in mind ActivityID is the primary key.

StateCode, ActivityID
StateCode, CreatedOn, ActivityID
StateCode, CreatedOn, StatusCode, ActivityID
StateCode, StatusCode, CreatedOn, Activity ID

I don't think any of this will prevent me from replacing them all with a single index – it will just be a bit larger than originally anticipated.

Best Answer

To my eyes, all 97 indexes could be dropped and replaced with one index on just StateCode (since ActivityID is the primary key and is already included as a hidden column). My question is: Is there any reason why I would want to keep multiple filtered indexes?

Like Brent said in the comments, this will be hard to give a good anser without many more details.

I think it's tough to get a good answer here without knowing a lot more specifics, like the queries involved that are using the indexes. That's probably where I'd start with the research - start looking at the index usage statistics on each filtered index, see if there are any differences between them, see if maybe there are some ActivityTypeCode values that aren't included in the filters, etc

And as Jonathan Fite said, if the queries are using the filtered indexes, they could be beneficial.

If your queries are actually using the filtered indexes and then especially if the index is a covering one for the query, then they could be beneficial. But I'm guessing (given the number of them) that they were added because someone had read about them and thought it would be a good idea (tm). See here for guidance on how to tell if the indexes are being used in queries.


Testing

This answer will simply be some tests and considerations in the hope of making your decision easier. There could be reasons for having all these filtered indexes depending on your workload, or they could be hoarding space and resources without having benefits. My amount of data will be smaller than yours, testing on SQL Server 2017. As always YMMV.

Trying to recreate parts of your setup

I have a SQL Server database and stumbled across a table with a uniqueidentifier primary key named ActivityId and columns StateCode (int) and ActivityTypeCode (int) along with many others. The tables has 13M rows and there are 94 unique values for ActivityTypeCode with row counts ranging from 1 to 4M

CREATE TABLE dbo.Activities(ActivityId UNIQUEIDENTIFIER PRIMARY KEY NOT NULL  DEFAULT NEWSEQUENTIALID() , StateCode int, ActivityTypeCode int,val1 int, val2 int);

INSERT INTO dbo.Activities(StateCode,ActivityTypeCode,val1,val2)
SELECT 1,1,1,1;
-- 1 value

INSERT INTO dbo.Activities(StateCode,ActivityTypeCode,val1,val2)
SELECT TOP(4000000) 2,2,ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) as rn1,ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) as rn2

FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2;
--4m values

INSERT INTO dbo.Activities(StateCode,ActivityTypeCode,val1,val2)
SELECT TOP(9000000) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) / 1000 % 300 as rn1,
                    ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) / 1000 % 93 as rn2,
                    ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) as rn3,
                    ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) as rn4

FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2
CROSS APPLY master..spt_values spt3;
-- 9M other values, ActivityTypeCode from 0 - 93

I have 97 different filtered indexes on (StateCode, ActivityId) where the only difference is the where clause on (ActivityTypeCode=(_some_value_)).

I went for 94 indexes, for the 94 distinct values in ActivityTypeCode

SELECT 'CREATE INDEX IX_StateCode_ActivityID_ActivityTypeCode_Filtered_'+CAST((rn-1) as varchar(2)) 
        + ' ON dbo.Activities(StateCode,ActivityId) INCLUDE(ActivityTypeCode)'
        + ' WHERE ActivityTypeCode = '+CAST((rn-1) as varchar(2)) +';'
FROM
(
SELECT TOP(94) ROW_NUMBER() OVER( ORDER BY (SELECT NULL)) as rn
FROM master..spt_values 
) as a

The space used

EXEC sp_spaceused 'dbo.Activities'

name        rows                    reserved    data        index_size  unused
Activities  13000001                994736 KB   527920 KB   437488 KB   29328 KB

Filtered index testing

In your question you did not specify if ActivityTypeCode is included in the filtered indexes. If they are not included, only exact filter matches will be used. More on that here.

Parameterized queries without OPTION RECOMPILE

When using parameterized queries without OPTION(RECOMPILE), the parameter will not be known to sql server at runtime.

This means that you will not be able to use your filtered indexes. We would not be able to reuse the plan with different parameters if we were able to use filtered indexes.

--If your queries are parameterized 
EXEC SP_EXECUTESQL N'SELECT StateCode,ActivityId 
                     FROM dbo.Activities 
                     WHERE ActivityTypeCode = @P1 and StateCode = @P2',N'@P1 INT, @P2 int',@P1=1,@P2=2

enter image description here

This is also true for stored procedures like the following:

CREATE PROCEDURE dbo.bla(@P1 int, @P2 int)
AS
BEGIN
SELECT StateCode,ActivityId 
FROM dbo.Activities 
WHERE ActivityTypeCode = @P1 and StateCode = @P2

END
EXEC dbo.bla @P1=1, @P2=2;

Parameterized queries with OPTION RECOMPILE

EXEC SP_EXECUTESQL N'SELECT StateCode,ActivityId 
                     FROM dbo.Activities 
                     WHERE ActivityTypeCode = @P1 and StateCode = @P2
                     OPTION(RECOMPILE)',N'@P1 INT, @P2 int',@P1=1,@P2=2

As stated earlier, OPTION(RECOMPILE) enables us to use filtered indexes when creating the plan. enter image description here

You could add the same OPTION(RECOMPILE) to the procedure talked about earlier.

You can also resort to dynamic SQL to pass parts of the filtering as a literal and part of the filtering as a parameterized query. More on that here.

Estimates

The one issue with the plan that uses the filtered index is that the query estimates are way higher than the actuals:

enter image description here

You could resolve these estimates by creating / changing the filtered index so that it has ActivityTypeCode as one of the key columns instead of ActivityId.

CREATE INDEX IX_StateCode_ActivityID_ActivityTypeCode_Filtered_1_B 
ON dbo.Activities(StateCode,ActivityTypeCode) 
INCLUDE(ActivityId) 
WHERE ActivityTypeCode = 1;

This is only true when also filtering on StateCode.

I was not able to create custom statistics or get sql server to use custom statistics to fix this. There might be a way to do this without having to change the index


Filtered index < - > regular index(es)

To my eyes, all 97 indexes could be dropped and replaced with one index on just StateCode (since ActivityID is the primary key and is already included as a hidden column). My question is: Is there any reason why I would want to keep multiple filtered indexes?

Assumptions

Lets assume that the queries can leverage the filtered indexes and that the ActivityTypeCode is included in the index.

The indexes

Proposed index by OP

CREATE INDEX IX_StateCode
ON dbo.Activities(StateCode);

This index seems strange to me, if you are only filtering on StateCode, why would the filtered indexes on ActivityTypeCode even exist?

Which is why we will focus on two other indexes in this example:

CREATE INDEX IX_StateCode_ActivityTypeCode
ON dbo.Activities(StateCode,ActivityTypeCode);


CREATE INDEX IX_ActivityTypeCode_StateCode
ON dbo.Activities(ActivityTypeCode,StateCode);

The space used

EXEC sp_spaceused 'dbo.Activities'

name        rows                    reserved    data        index_size  unused
Activities  13000001                1819584 KB  527936 KB   1260992 KB  30656 KB

Testing

Filtering on StateCode & ActivityTypeCode

Running the query that used the filtered index earlier:

EXEC SP_EXECUTESQL N'SELECT StateCode,ActivityId 
                     FROM dbo.Activities 
                     WHERE ActivityTypeCode = @P1 and StateCode = @P2
                     OPTION(RECOMPILE)',N'@P1 INT, @P2 int',@P1=1,@P2=2

The new index is used and the estimates are correct.

enter image description here

We saw earlier that the estimated for the filtered index were ~4M rows for these specific parameters. This could have been circumvented by creating a new index

Filtering only on ActivityTypeCode

When only filtering on ActivityTypeCode it starts to get interesting

EXEC SP_EXECUTESQL N'SELECT StateCode,ActivityId 
                     FROM dbo.Activities 
                     WHERE ActivityTypeCode = @P1
                     OPTION(RECOMPILE)',N'@P1 INT',@P1=1

The non filtered index is used

enter image description here

While in theory, the entire filtered index could be scanned without having to apply any filtering: enter image description here

And both estimated rows are correct.

Even when filtering on the parameter that should return +4M rows @P1=2 the non filtered index + seek is chosen.

enter image description here

This is due to the estimated rows and the index seek <-> scan tipping point.

When adding 4M extra rows, the scan is chosen.

enter image description here

If it comes down to this, I would be fine with dropping the filtered indexes and using the regular indexes specified above. But again, this is based on one query on own test data. More details would be needed to get an answer closer aligned with your dataset & workload.

What does the testing tell us?

The testing results only gives us information about my own tests and my own queries.

Some takeaways are that:

  • the size of the filtered indexes is not going to be higher than regular indexes. Managing these will be harder, what if new values are added that have no filtered index
  • There are cases when you cannot use filtered indexes, such as parameterization
  • If you are using literals / OPTION(RECOMPILE) you will be able to use filtered indexes and might get better query plans at the cost of recompilations
  • You should at include the column in the filtered index & watch out for wrong estimates

The rest is up to you to figure out.