Sql-server – RID vs INCLUDE on a large field

sql server

I have a table that stores notes

create tblNote(  
  Id int identity(1,1),  
  ParentId  int ,   
  ParentType varchar(32),   
  NoteType varchar(32),   
  Note varchar(max),  
  CreatedBy varchar(25),   
  CreatedDate  datetime,   
  .  
  .  
  .  
  <other metadata about the note>  
)  

I have done a lot of reading recently about how MSSS handles indexes (2005 and forward).

I have a clustered index on ID

[ I have considered changing the clustered index to parentId, parentType since that is reasonably narrow and it is static. ]

The overwhelming percentage of queries against this table are going to be along the lines of

select NOTE, createdDate, createdBy 
from tblNote 
where parentId = 12 and parentType = 'RFQ'

The question I want to ask today (though any feedback is welcome) is this:

The NC index I could add is:

create index  idx_nc_note_parent(  
        parentId ,   
        parenttype  
    )  
    include (createdby, createdDate)  

This would be useful in creating little lists of the notes where we might include who and when type info.

I am hesitant to include a varchar(max) field. It seems like it would really hurt the amount of the index that would be cached (Is this reasonable or unreasonable)

Assuming I dont include the NOTE field, a RID Lookup will be necessary to actually fetch the note content if it is requested.

While I have read quite a bit about how expensive RID lookups are, it still has to be better to have this index as opposed to doing a table scan, RIGHT?

[apologies for the code block, i have added the 4 spaces, but maybe i did it wrong? ]

Best Answer

Since you said that most queries would generally return few rows, letting the query use a RID lookup (key lookup in this case as the table has a clustered index) is perfectly fine for retrieving a potentially large field. For a highly-available system, I couldn't recommend putting a LOB type in an index anyway as this prevents online rebuilds (for versions of SQL Server before 2012). Also, you need to be very careful that the query plan always sticks to a seek-type plan and not tip into a table scan which could be very expensive. This is a case where I might use a table hint (or a plan guide if the query can't be modified) even if it isn't absolutely necessary.

Another option is to recreate the clustered index on the combination of parentId and parentType if this combination of values is static and generally increasing over time. It would be better if parentType was an integral type, though, and you may want to look into changing that anyway to save storage space if the base table is, or will become, large. Considering this change also involves looking at how it might affect indexing for the other classes of queries that run against this table.

If either of those two methods aren't fast enough for the workload, look into implementing a data caching solution using something like AppFabric, which scales much more readily than running a SQL query every time you need data. This can be a huge payoff; the cost is added complexity.