SQL Server – Implementation of Combined Index

btreeindexsql server

I understand how single indexed column works in SQL Server and how it is implemented using balanced trees. There are plenty of interesting videos on YouTube on this topic. However I don't understand how it works if the index is based on multiple columns. For example:

CREATE NONCLUSTERED INDEX idxItemsCatState
ON Items (Category,OfferState)
INCLUDE ([Id],[Ranking])

And how it can speedup query like

SELECT ID, Ranking FROM Items where Category = 1 AND OfferState < 3

It is still implemented as B-Tree? How it can evaluate the combination of values? What are the restrictions for such feature?

Best Answer

To store rows in a b-tree and perform a seek, all that is needed is an ordering in which the rows should be sorted. Just like you can sort on (Category), you can also sort on the tuple (Category, OfferState). In the latter case, rows are first sorted by Category and then any ties are broken by sorting by OfferState.

The resulting index will use the same b-tree structure, but the value for each entry in the b-tree will be a (Category, OfferState) tuple.

And how it can speedup query like...

For your query, SQL Server can perform a seek in the following way:

  • Seek to the first row that matches Category = 1. This can be done using the same b-tree seek you are familiar with, with SQL Server only needing to use the Category portion of each (Category, OfferState) tuple.
  • Begin reading rows, and continue until a row with OfferState >= 3 is found

In this way, SQL Server will be able to seek directly to the beginning of the range of rows that are needed, read those rows, and stop at the end of the range of rows. Note that you can see how this seek works by looking at the Seek Predicate property of the Index Seek operator in your query plan.

More generally, a multi-column index across columns (a, b, c, d, ...) can support a seek on any leading subset of columns, such as (a) or (a, b, c), when you are matching for equality (using =).

If you are looking for a range (e.g., b < 3), SQL Server can no longer seek on any columns that fall later in the index. In such a case, it would have to perform a separate seek within each distinct value of b, which is not supported (except in a more specific case you probably don't need to worry about: across partitions of a partitioned table).