Sql-server – How will large index INCLUDE fields affect system performance

index-tuningperformancesql serversql-server-2008

This question is about SQL Server index performance with a varchar(2000) as an INCLUDE in a covering index.

I’m trying to improve performance in an slow and unstable database application. In some cases, data is accessed through large varchar strings, with the queries including multple string operations like SUBSTRING(), SPACE(), and DATALENGTH(). Here is a simplified example of access;

update fattable set col3 =  
   SUBSTRING(col3,1,10) + '*' + 
   SUBSTRING(col3,12,DATALENGTH(col3)-12)
from fattable where substring(col3,10,1) = 'A' and col2 = 2

The schema looks like this:

CREATE TABLE [dbo].[FatTable]( 
    [id] [bigint] IDENTITY(1,1) NOT NULL, 
    [col1] [nchar](12) NOT NULL, 
    [col2] [int] NOT NULL, 
    [col3] [varchar](2000) NOT NULL, ... 

The following index has been defined, with a covering field on the large text column.

CREATE NONCLUSTERED INDEX [IndexCol2Col3] ON [dbo].[FatTable]  ( [col2] ASC ) 
    INCLUDE( [col3] )

From what I’ve read it is BAD to put large data fields in an index. I’ve been reading several articles, including http://msdn.microsoft.com/en-us/library/ms190806.aspx which discuss the impact of paging and disk size on index performance. This being said, the query plan definitely uses the covering index. I don’t have enough information to determine how much this is actually costing me in terms of system load. I do know that overall, the system is performing poorly, and I am concerned that this is one of the issues.
Questions:

  • Is putting this varchar(2000) column in the index INCLUDE ever a good idea?

  • Since the INCLUDE fields are stored in leaf nodes, do they have much
    impact index performance?

Update: Thanks for the excellent replies! This is an unfair question in some ways – as you guys say, there is no absolute right answer without actual statistics and profiling. Like so many performance issue, I guess the answer is "it depends".

Best Answer

Ever is a big word, but, in general, no, I wouldn't put a varchar(2000) field into an INCLUDE.

And yeah, the way that data is stored at the page level can seriously impact performance of the index, depending on how the index is used.

The thing is, the more rows of data you can cram into a page, the fewer pages have to get accessed, the faster your system is, for the most part. Adding a really large column means less information stored on a page, so, in the event of range seeks or scans, more pages have to be read to retreive the data, seriously slowing stuff down.

To know for sure if this is an issue on your query, or on your system, you'd have to monitor the reads, especially the number of pages that the query uses.