Sql-server – How Selective Should Index Be

indexperformanceperformance-tuningsql server

Is there a general Selectivity rule of when to apply Nonclustered index?

We know not to create an index on a bit column, 50/50. "Rows with 50/50 distribution, it might buy you very little performance gain " Index Bit Field in SQL Server

So how selective should a query be in SQL Server, before an index should be applied? Is there a general rule in SQL Server guidelines? At 25% average selectivity distribution in a column? 10% Selectivity?

This article is stating around 31%? How Selective Should Index Be?

Best Answer

Considering column selectivity only when deciding which columns to index ignores quite a bit of what indexes can do, and what they're generally good for.

For instance, you may have an identity or guid column that's incredibly selective -- unique, even -- but never gets used. In that case, who cares? Why index columns that queries don't touch?

Much less selective indexes, even BIT columns, can make useful, or useful parts of indexes. In some scenarios, very un-selective columns on large tables can benefit quite a bit from indexing when they need to be sorted on, or grouped by.

Joins

Take this query:

SELECT COUNT(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id = p.OwnerUserId;

Without a helpful index on OwnerUserId, this is our plan with a Hash Join -- which spills -- but that's secondary to the point.

NUTS

With a helpful index -- CREATE INDEX ix_yourmom ON dbo.Posts (OwnerUserId); -- our plan changes.

NUTS

Aggregates

Likewise, grouping operations can benefit from indexing.

SELECT   p.OwnerUserId, COUNT(*) AS records
FROM     dbo.Posts AS p
GROUP BY p.OwnerUserId;

Without an index:

NUTS

With an index:

NUTS

Sorts

Sorting data can be another sticking point in queries that indexes can help.

Without an index:

NUTS

With our index:

NUTS

Blocking

Indexes can also help avoid blocking pile-ups.

If we try to run this update:

UPDATE p
SET p.Score += 100
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656;

And concurrently run this select:

SELECT *
FROM   dbo.Posts AS p
WHERE  p.OwnerUserId = 8;

They'll end up blocking:

NUTS

With our index in place, the select finishes instantly without being blocked. SQL Server has a way to access the data it needs efficiently.

In case you're wondering (using the equation Kumar provided) the OwnerUserId column's selectivity is 0.0701539878296839478

Wrap it up

Don't just blindly index columns based on how selective they are. Design indexes that help your workload run efficiently. Using more selective columns as leading key columns is generally a good idea when you're searching for equality predicates, but can be less helpful when searching on ranges.