Sql-server – Can SQLServer implicitly combine clustered and nonclustered index for queries that hit the NCI with a WHERE and the CI with an ORDERBY

azure-sql-databasesql server

I recently asked a question about the best indexing strategy for a table that must record events, in order, against a GUID. The GUID is a nuisance but it's unavoidable because it's an FK to elsewhere.

In terms of the queries I want to do they are essentially of the form SELECT TOP N SomeColumn FROM t WHERE GuidColumn = 'someguid' ORDER BY AutoIncInt DESC

I've done some experimentation and a good tradeoff approach seems to be:

  • Create the AutoIncInt column on the table as the unique clustering key
  • Create a nonunique nonclustered index on GuidColumn, and INCLUDE SomeColumn

At first I had the nonclustering index on SomeGuid ASC, AutoIncInt DESC, thinking it would help the ORDER BY AutoIncInt DESC part of the query but the plan remained the same when I removed it – no sort operation was needed and the query just hit the NC index, did a top and a select; it seems that SQLServer is somehow smart enough to know, even when AutoIncInt is not cited in the nonclustering index at all, there's a way to easily get the data out in order / easily calculate which last N rows are needed for the TOP

How does this mechanism work in practice? Does the nonclustering index have an implicit order thanks to referring to the clustering index, such that SQLS can quickly and easily deduce which N index nodes I want the SomeColumn from?

Best Answer

Each leaf node row in a SQL Server b-tree index uniquely identifies a single row in the table by a unique key. In a non-clustered index, this unique key consists of the declared key columns plus the row locator. The row locator is the clustered index key (including uniqueifier for when the CI key value is not unique). With a heap (no clustered index on the table), the row locator is the physical location of the row (file,page,slot).

The DBCC PAGE output of this demo script shows both GuidColumn and AutoIncInt as index keys even though AutoIncInt was not specified:.

DROP TABLE IF EXISTS dbo.t;
CREATE TABLE dbo.t (
      AutoIncInt int IDENTITY NOT NULL CONSTRAINT pk_t PRIMARY KEY CLUSTERED
    , GuidColumn uniqueidentifier NOT NULL
    , SomeColumn char(5000) NOT NULL
)
CREATE INDEX idx ON t(GuidColumn) INCLUDE (SomeColumn);
INSERT INTO t (GuidColumn, SomeColumn) VALUES('00000000-0000-0000-0000-000000000001','3')
INSERT INTO t (GuidColumn, SomeColumn) VALUES('00000000-0000-0000-0000-000000000001','1')
INSERT INTO t (GuidColumn, SomeColumn) VALUES('00000000-0000-0000-0000-000000000001','2')

DECLARE @dbid int, @fileid int, @pageid int;
SELECT TOP (1) 
      @dbid = DB_ID()
    , @fileid = allocated_page_file_id
    , @pageid = allocated_page_page_id
FROM sys.dm_db_database_page_allocations( DB_ID(), OBJECT_ID(N'dbo.t','U'), 2, 1, 'DETAILED')
WHERE 
    page_type_DESC = 'INDEX_PAGE'
ORDER BY 
      allocated_page_file_id
    , allocated_page_page_id;

DBCC PAGE(@dbid,@fileid,@pageid,3);
GO

Results:

+--------+--------+-----+-------+--------------------------------------+------------------+------------+----------------+----------+
| FileId | PageId | Row | Level |           GuidColumn (key)           | AutoIncInt (key) | SomeColumn |  KeyHashValue  | Row Size |
+--------+--------+-----+-------+--------------------------------------+------------------+------------+----------------+----------+
|      1 |    424 |   0 |     0 | 00000000-0000-0000-0000-000000000001 |                1 |          3 | (63f9557c63d5) |     5038 |
+--------+--------+-----+-------+--------------------------------------+------------------+------------+----------------+----------+

This index architecture has positive performance implications for your query because the AutoIncInt column is not only implicitly included in the non-clustered index as the row locator, it is also part of the unique key and thus ordered. This allows the plan for your query to use an index seek for the GuidColumn WHERE predicate plus a backwards scan for the ORDER BY...DESC clause.

Below is an excerpt from the actual query plan shows a backwards scan of the index. The resultant non-clustered index will be the same regardless of whether AutoIncInt is explicitly specified as a key column.

<IndexScan Ordered="true" ScanDirection="BACKWARD"