Sql-server – How to prevent RID lookup

clustered-indexindex-tuningsql server

I have a table that stores time-series data and has about 8 million rows. The table structure is as follows:

Timestamp | ComponenentID | Parameter1 | Parameter2 | Parameter3

I have a clustered index on the Timestamp and ComponentID column. (This is also my primary key)

The query I am trying to run is:-

SELECT * FROM table 
WHERE Timestamp BETWEEN '2020-01-01' 
AND '2020-01-02' 
AND ComponentId = 5

When I run this query an RID lookup seems to be taking place. I have read that this only happens when the index does not cover everything but since I have a clustered index I thought it should itself cover everything and prevent the lookup. How do I prevent this lookup from happening?

Query 1 (Timestamp+ComponentId):-
Query 1

Query 2 (Timestamp):-
Query 2

Execution Plan

Best Answer

RID Lookups occur on a heap data structure in SQL Server (as opposed to a B-Tree). This occurs when a non-covering nonclustered index is used to fetch the data and it needs to lookup the remaining fields it's missing. Your table data is stored in a heap when there is no clustered index on that table (as the clustered index defines the ordering the records are sorted into a B-Tree normally).

If your table had a clustered index and this query wanted to use the same nonclustered index it's currently using, you'd see a Key Lookup operation in the execution plan instead.

If you created a clustered index on your table for the two fields Timestamp and ComponenentID then that'll be covering for your query and you should see the clustered index used in the execution plan instead which will eliminate any kind of additional lookup operation.


Regarding your second question in the comments, based on your recent comment updates, it sounds like the difference in runtime you're seeing is due to the first run pulling the data into memory from disk (which is generally the most bottlenecking part of the process, from a hardware perspective) and the second run leveraging the existing data in memory.

Depending on how big your Table and Page Sizes are, this normally shouldn't be too much of a concern (based on the number of rows I see your query is returning from its execution plan). All subsequent runs of the query (while the data is still in memory) will have optimal performance.

If the initial run to pull the data off the disk becomes a problem then you can either look into Compression or analyze if you can upgrade your disks to something faster (not sure if you're still on a mechanical hard drive currently, and can switch to an SSD or even better is an NVMe).

I'll also add a final note that was made in the comments that is true and related but not the root issue itself, that your query which does additional filtering on ComponentId results in more data being read. This is evident in your IO statistics screenshots if you compare the grand totals for Logical Reads between the two queries. The one query filtering on Timestamp and ComponentId results in 396 Logical Reads whereas the other query filtering on just Timestamp only results in 90 Logical Reads. Logical Reads are the number of 8 KB Pages that are read from memory. Physical Reads are the number of 8 KB Pages that were read from the disk. It's not a huge difference, but would account for a small difference in time between running the first query vs the second query, on their first runs. This becomes a moot point on subsequent runs of both queries when the data is already in memory from disk (as discussed in my previous few paragraphs).