Sql-server – Are key lookups from non-clustered indexes always slower than a second query that does the lookup

bookmark-lookupnonclustered-indexsql serversql-server-2016temporary-tables

I've noticed in my system, whenever a non-clustered index is used in a query that has to also do a key lookup to get the additional fields being selected, it's faster for me to instead do two queries.

The first with the non-clustered index inserting only the key field into a temp table (so no key lookup is performed) and the second using that temp table to join back to the original table to filter it down on the key and then select the fields I need.

I'm typically querying tables with hundreds of millions to tens of billions of rows when I notice this. I'm not sure if it can be related to the fact that I'm eliminating the key lookup when the table is first loaded into memory and instead I'm inserting the key into a temp table so that the subsequent field lookup query occurs between two tables already in memory?

The difference in time I'll see is usually significant too, e.g. on the order of minutes.

Best Answer

I can think of a couple of cases where this approach might be beneficial.

Sometimes you can end up with an execution plan which does a load of lookups for rows that are then ultimately filtered out downstream (I've noticed this especially with pagination queries). If you only store the rows post filtering there will be fewer "manual lookups" to resolve. This scenario can generally be addressed with a self join and does not need intermediate materialisation.
Lookups always use nested loops. The "manual lookup" might use a different join type - the cardinality estimates for rows to lookup will be spot on when materialised and may differ from the original estimates encouraging this.

For the case where neither of the above apply (and you are just materialising into a temp table and getting the same number of nested loops lookups as you would have got without this step and not benefiting from improved cardinality estimates) I would expect this to generally be slower than the original query without the intermediate step (as on the face of it you are doing the same work with some additional overhead added) but haven't tested this.

Related Solutions

Sql-server – Should a table have a clustered index even if it doesn’t have appropriate fields for it

1) IF PlayerId is assigned with NEWSEQUENTIALID, you could consider that as the clustered index.

2) Otherwise, you can add an IDENTITY and make that clustered (questionable benefit, since all access will be through the PK you have already established).

3) Or you can leave it as a heap - with appropriate non-clustered indexes.

My order of preference would be 1, 3, 2 assuming you can't change the uniqueidentifier to an IDENTITY instead.

Can you explain why you are using uniqueidentifier in the first place? - that may have some bearing on this.

Sql-server – Excluding clustered key columns from non-clustered indexes definitions

On the SQL Saturday event I've attended, one of the lecturers said it's a always good practice to exclude the clustered key columns from the non-clustered index key definition and include clause, too.

I disagree. Let me explain with the table in your question:

The clustered index is: (RecordID, QuestionID) and there are many more columns.

Any non-clustered index will also have the clustered key columns, too, appended in the end. So, an index like:

(Pts)   is equivalent to:   (Pts, RecordID, QuestionID)

and similarly:

(Pts, PtsOf)           <->:  (Pts, PtsOf, RecordID, QuestionID)

(Pts, RecordId)        <->:  (Pts, RecordID, QuestionID)

(Pts, QuestionID)      <->:  (Pts, QuestionID, RecordID)

(Pts) INCLUDE (PtsOf)  <->:  (Pts, RecordID, QuestionID) INCLUDE (PtsOf) 

(QuestionID, RecordID) <->:  (QuestionID, RecordID)

For joint tables or like this one that have a composite primary/unique key - whether it is clustered or not, it's very often to have queries that need the (a,b) index and others that will use better the (b,a) index, sometimes queries that need both. So, one often needs both of these.

If the composite clustered key has more than two columns - say (a,b,c) - it's often that you may need an index on (b,c,a) or in (d,b) and another on (e,c,a) (which of course will be equivalent to (b,c,a), (d,b,a,c) and (e,c,a,b) respectively.) You can't just remove these columns from the definitions of the non-clustered keys because the column order will change.

The suggestion has one good point though. The clustered key columns can be removed from the INCLUDE part. They are redundant, just noise there.

About the indexes in the question, a non-CI on (Pts, PtsOf) is equivalent to (Pts, PtsOf, RecordID, QuestionID), so it is very different than the original non-CI on (RecordID, QuestionID) INCLUDE (Pts, PtsOf). It will use a bit more space than the original and of course these two indexes will be useful for different types of queries.

The (Pts, PtsOf) will be be useful, for example, for queries with WHERE Pts BETWEEN 0 AND 100, WHERE Pts = 200 AND PtsOf = 300, etc.
The (RecordID, QuestionID) INCLUDE (Pts, PtsOf) is basically a copy of the table with only the 2 clustered key columns and 2 only extra columns (of the many). This is (rarely) useful and it's a form of vertical partitioning. If you often have queries that need all the rows of the table but only these 2 columns, then it's probably one these (rather rare) cases where the extra space and effort to maintain this index is justified.

Best Answer

Related Solutions

Sql-server – Should a table have a clustered index even if it doesn’t have appropriate fields for it

Sql-server – Excluding clustered key columns from non-clustered indexes definitions

Related Question