SQL Server – Implementation of Combined Index

btreeindexsql server

I understand how single indexed column works in SQL Server and how it is implemented using balanced trees. There are plenty of interesting videos on YouTube on this topic. However I don't understand how it works if the index is based on multiple columns. For example:

CREATE NONCLUSTERED INDEX idxItemsCatState
ON Items (Category,OfferState)
INCLUDE ([Id],[Ranking])

And how it can speedup query like

SELECT ID, Ranking FROM Items where Category = 1 AND OfferState < 3

It is still implemented as B-Tree? How it can evaluate the combination of values? What are the restrictions for such feature?

Best Answer

To store rows in a b-tree and perform a seek, all that is needed is an ordering in which the rows should be sorted. Just like you can sort on (Category), you can also sort on the tuple (Category, OfferState). In the latter case, rows are first sorted by Category and then any ties are broken by sorting by OfferState.

The resulting index will use the same b-tree structure, but the value for each entry in the b-tree will be a (Category, OfferState) tuple.

And how it can speedup query like...

For your query, SQL Server can perform a seek in the following way:

Seek to the first row that matches Category = 1. This can be done using the same b-tree seek you are familiar with, with SQL Server only needing to use the Category portion of each (Category, OfferState) tuple.
Begin reading rows, and continue until a row with OfferState >= 3 is found

In this way, SQL Server will be able to seek directly to the beginning of the range of rows that are needed, read those rows, and stop at the end of the range of rows. Note that you can see how this seek works by looking at the Seek Predicate property of the Index Seek operator in your query plan.

More generally, a multi-column index across columns (a, b, c, d, ...) can support a seek on any leading subset of columns, such as (a) or (a, b, c), when you are matching for equality (using =).

If you are looking for a range (e.g., b < 3), SQL Server can no longer seek on any columns that fall later in the index. In such a case, it would have to perform a separate seek within each distinct value of b, which is not supported (except in a more specific case you probably don't need to worry about: across partitions of a partitioned table).

Related Solutions

Sql-server – SQL to select random mix of rows fairly

A clustered index seek or scan could be improved to a non-clustered index seek or scan which should be more efficient.

Since it looks like your problem is Products, I would see about adding an index which would be covering on that table (or perhaps an indexed view since you already have:

Id ManufacturerId Active MemberPrice

Because some of your other columns don't have prefixes, I can't tell where they come from, but I expect some of them also come from Products, so this might not be feasible to make this index covering.

However, but having Active and MemberPrice in the non-clustered index, this might help. It might be enough to tip the plan in favor of a NCI with a lookup to the clustered index to get the remaining columns (like FamilyImageName)

Mysql – InnoDB – Use combined index with primary key on GROUP BY

Because you only select columns from table_1 and because table_2 is only joined (on the right side) of the LEFT JOIN and none of its columns is used elsewhere (SELECT, WHERE, GROUP BY or ORDER BY clauses), you can completely remove the LEFT JOIN table_2 t2 on t1.id = t2.refid part.

This might have given less rows in the result but because you group by the Primary key of table_1 after the join, there is no such case.

SELECT t1.a 
FROM table_1 t1 
WHERE t1.b = 99
GROUP BY t1.id
ORDER BY t1.c ;

Now, because you group by the Primary key of table_1, which is redundant when you have only one table, you can also remove that part: GROUP BY t1.id

Finally, the query is equivalent to:

SELECT t1.a 
FROM table_1 t1 
WHERE t1.b = 99
ORDER BY t1.c ;

which should use the index on (b, c), which yes, includes the primary key. You may consider it to be (b, c, id) if you want.

But whether that index will be used is not 100% sure. Depending on the selectivity of column b (what percent of the whole table has b=99?) this index may or may not be used. If there a lot of rows with b=99 (a large percentage), the optimizer may choose to scan the whole table instead and do a filesort, than use the index, select those (99,c) combinations that exist and then hit the table to find the a values.

If you have an index on (b, c, a) or on (b, c, id, a), the query will be able to find all the info that is needed in this index and in the correct order, so it will use it.

Best Answer

Related Solutions

Sql-server – SQL to select random mix of rows fairly

Mysql – InnoDB – Use combined index with primary key on GROUP BY

Related Question