Sql-server – Adding columns from joined table slows down query incredibly

sql serversql-server-2012

I have a fairly complex query that joins several tables and views. It executes pretty quickly (~9 seconds for 2200 rows), except when I add a column from a view that is already joined in the query the query seems like it will never finish – I left it running for 33 minutes once and it had not completed.

SELECT
    table1.col1,
    table2.col2,
    view1.col3,
    view2.col4

FROM
    table1
    INNER JOIN table2 ON table1.id = table2.id
    LEFT OUTER JOIN view1 ON view1.id = table1.id
    LEFT OUTER JOIN view2 ON view2.id = view1.id
    LEFT OUTER JOIN view3 ON CONVERT(CHAR(8),view1.DateRequired,112) + LEFT(view2.ItemNumber, 4) = view3.UID

The above will run fine, but when I add any column from view3 into the SELECT list the query slows down to a crawl. I'm unable to get the full result set but if I limit the result to 50 rows it takes around 1 minute to run to completion compared to roughly 1 second with the original query.

Now, I know that the join for view3 is pretty complicated; it converts one value from a date to a string and then concatenates this to a value from another table and uses this resulting concatenated blob of info to join against the same style code in view3. However, I would expect that if this were a problem with the join itself then it would slow down the query when the view itself is joined, not just when columns from that view are added into the SELECT list.

I managed to run an execution plan when the extra column is added and it is absolutely colossal. I did manage to find one task that has a cost of 51%: RID Lookup (Heap). I'm not au fait with execution plans so not sure what this means or how to resolve it.

UPDATE

Using Scott's answer I created a non-clustered index on the row that was causing the expensive RID lookup. It improved performance quite a lot – down from over a minute to around 30 seconds to return the 50 row limit.

I think Nic is right in his suggestion that the datatype conversion from date to char(8) in the join condition is causing the bulk of the cost when querying. Going to work on another way to join view3.

Best Answer

A RID Lookup is a lookup into a heap table using a Row ID. The Row ID is included in a non-clustered index in order to find the rest of a table's data in the heap table. Since a heap table is a table without a clustered index and is sorted unordered, a Row ID is required for the correlation.

My guess is that view3.UID is found easily via a non-clustered index with that column. When you actually start asking for other columns (by specifying them in the SELECT list that are not part of that non-clustered index, Sql Server has to go find the other data items in the unordered heap - that's the RID Lookup - which can be quite expensive depending on the amount of data you're dealing with.

You 'might' see improved performance if you can identify the 'source' tables referenced in view3 and possibly 'including' your SELECT columns in supporting indexes on those source tables - this is called a 'covering index' in that the index is able to 'cover' or retrieve all of the information you're asking for without having to go elsewhere (clustered-index or heap)

This link has some good information on RID Lookups.

Here is a brief summary of the main points of the link:

When you find an RID Lookup in a query plan, it’s a symptom. It indicates a database schema that breaks several rules of thumb. Those rules-of-thumb are:

  • Each table should have a clustered index (of course there are exceptions but we’re dealing with rules-of-thumb here).
  • A non-clustered index has been created indicating that someone somewhere identified an ordering on one or more columns that made sense for that data.
  • There is at least one query (i.e. the one that generated the RID Lookup) that needs columns that are not covered by the non-clustered index.

Something else to 'try' - use a Common Table Expression (CTE) in an attempt to force Sql Server to 'materialize' view3. Before your main SELECT, add the CTE - this 'assumes' that no more than 2147483648 (max int value) could be returned - adjust as needed.

with View3Materialzed as
(
select top 2147483648 UID from view3 order by UID
) 

Then, use the CTE View3Materialzed in the regular joins. I have used this technique with some success. It may or may not help you.