Sql-server – Value of the Included Column is stored in Leaf Node

database-designindexnonclustered-indexsql server

There is SQL- script, which generated the Nonclustered Index with Included Column:

CREATE TABLE users 
  ( 
     id        INT, 
     firstname VARCHAR(50), 
     surname   VARCHAR(50) 
  ); 

CREATE CLUSTERED INDEX ix_users_id 
  ON users (id); 

CREATE NONCLUSTERED INDEX ix_users_firstname 
  ON users (firstname) 
  include (surname); 

SELECT firstname, 
       surname 
FROM   users 
WHERE  firstname = 'John';

If I correctly understood, most of the time, Engine of my SQL Server 2019 will seek Nonclustered Index for the above SELECT query, without touching the Clustered Index. Does that mean the value of surname column is stored in Leaf Node of the Nonclustered Index? Also, that means the value of surname is duplicated because it also stored in Clustered Index.

Am I right?

Best Answer

Does that mean the value of surname column is stored in Leaf Node of the Nonclustered Index?

Yes, INCLUDE column values are stored at the leaf node of the index in a non-clustered index. In a clustered index, non-key columns are stored at leaf nodes in the same fashion. A heap works differently because there are no key columns.

Also, that means the value of surname is duplicated because it also stored in Clustered Index.

All non-clustered indexes store duplicate data. For example, the first name column is stored in the clustered index as well as the non-clustered index.

Without a clustered index, the data for all columns is stored in a heap and then data is duplicated in any non-clustered indexes for columns that are key columns or include columns in those indexes.

This is why when designing your indexing strategy, you need to balance performance vs storage/maintenance for the duplicated data.

Check this link for more information about SQL Server Indexes.

Related Solutions

Sql-server – Interpreting an execution plan

From your question I gather that your table is relatively small. As you put more rows in the table you'll find that the bookmark lookup stays about the same, and the scan takes longer and longer. Eventually the scan will cost many times more than the bookmark lookup.

As SQLMenace said, execution plan costs are often unreliable. Use Query Profiler or SET STATISTICS to see what resources are actually being consumed by each query.

Finally, make sure statistics are updated on your table or the engine can make poor choices about which indexes or tables to use in what order.

Sql-server – Clustered Table Scan Because of “SELECT *”

If you need columns in the output that aren't covered by the index, the optimizer has to make a choice:

Perform a table / clustered index scan (therefore all columns are there)
Perform a seek, then perform lookups to retrieve the columns not covered

Which way it will choose depends on a variety of things, including how narrow the index is, how many rows match the predicate, etc. You can force a seek with the FORCESEEK hint, but I suspect it will end up performing the same or worse than the scan SQL Server has chosen in your case.

Some options:

Change the app to run a proper query. I listed this first for a reason.
Create a view that selects only the columns you need:
```
CREATE VIEW dbo.myview
WITH SCHEMABINDING
AS
  SELECT col1, col2, col3 FROM dbo.tablename;
```
Then you can change the app to SELECT * from this view. Or you can get even more creative and rename the original table, and change the name of this view to what the name of the table used to be. Breaking change, obviously; proceed with caution.
Add all of the other columns to the key or INCLUDE list for the index. If these are hard-coded values and always the ones used, you may consider a filtered index.

Best Answer

Related Solutions

Sql-server – Interpreting an execution plan

Sql-server – Clustered Table Scan Because of “SELECT *”

Related Question