Sql-server – Adding columns from joined table slows down query incredibly

sql serversql-server-2012

I have a fairly complex query that joins several tables and views. It executes pretty quickly (~9 seconds for 2200 rows), except when I add a column from a view that is already joined in the query the query seems like it will never finish – I left it running for 33 minutes once and it had not completed.

SELECT
    table1.col1,
    table2.col2,
    view1.col3,
    view2.col4

FROM
    table1
    INNER JOIN table2 ON table1.id = table2.id
    LEFT OUTER JOIN view1 ON view1.id = table1.id
    LEFT OUTER JOIN view2 ON view2.id = view1.id
    LEFT OUTER JOIN view3 ON CONVERT(CHAR(8),view1.DateRequired,112) + LEFT(view2.ItemNumber, 4) = view3.UID

The above will run fine, but when I add any column from view3 into the SELECT list the query slows down to a crawl. I'm unable to get the full result set but if I limit the result to 50 rows it takes around 1 minute to run to completion compared to roughly 1 second with the original query.

Now, I know that the join for view3 is pretty complicated; it converts one value from a date to a string and then concatenates this to a value from another table and uses this resulting concatenated blob of info to join against the same style code in view3. However, I would expect that if this were a problem with the join itself then it would slow down the query when the view itself is joined, not just when columns from that view are added into the SELECT list.

I managed to run an execution plan when the extra column is added and it is absolutely colossal. I did manage to find one task that has a cost of 51%: RID Lookup (Heap). I'm not au fait with execution plans so not sure what this means or how to resolve it.

UPDATE

Using Scott's answer I created a non-clustered index on the row that was causing the expensive RID lookup. It improved performance quite a lot – down from over a minute to around 30 seconds to return the 50 row limit.

I think Nic is right in his suggestion that the datatype conversion from date to char(8) in the join condition is causing the bulk of the cost when querying. Going to work on another way to join view3.

Best Answer

A RID Lookup is a lookup into a heap table using a Row ID. The Row ID is included in a non-clustered index in order to find the rest of a table's data in the heap table. Since a heap table is a table without a clustered index and is sorted unordered, a Row ID is required for the correlation.

My guess is that view3.UID is found easily via a non-clustered index with that column. When you actually start asking for other columns (by specifying them in the SELECT list that are not part of that non-clustered index, Sql Server has to go find the other data items in the unordered heap - that's the RID Lookup - which can be quite expensive depending on the amount of data you're dealing with.

You 'might' see improved performance if you can identify the 'source' tables referenced in view3 and possibly 'including' your SELECT columns in supporting indexes on those source tables - this is called a 'covering index' in that the index is able to 'cover' or retrieve all of the information you're asking for without having to go elsewhere (clustered-index or heap)

This link has some good information on RID Lookups.

Here is a brief summary of the main points of the link:

When you find an RID Lookup in a query plan, it’s a symptom. It indicates a database schema that breaks several rules of thumb. Those rules-of-thumb are:

Each table should have a clustered index (of course there are exceptions but we’re dealing with rules-of-thumb here).

A non-clustered index has been created indicating that someone somewhere identified an ordering on one or more columns that made sense for that data.

There is at least one query (i.e. the one that generated the RID Lookup) that needs columns that are not covered by the non-clustered index.

Something else to 'try' - use a Common Table Expression (CTE) in an attempt to force Sql Server to 'materialize' view3. Before your main SELECT, add the CTE - this 'assumes' that no more than 2147483648 (max int value) could be returned - adjust as needed.

with View3Materialzed as
(
select top 2147483648 UID from view3 order by UID
)

Then, use the CTE View3Materialzed in the regular joins. I have used this technique with some success. It may or may not help you.

Related Solutions

Sql-server – join syntax / style performance consideration

In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.

As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.

To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.

Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.

To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:

SELECT
    P.Name,
    PS.Name,
    SUM(TH.Quantity),
    SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
    ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
    ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
    ON INV.ProductID = P.ProductID
GROUP BY
    P.ProductID,
    P.Name,
    PS.ProductSubcategoryID,
    PS.Name;

Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):

Logical tree

I can (carefully) rewrite the query to use 'nested' syntax:

SELECT
    P.Name,
    PS.Name,
    SUM(TH.Quantity),
    SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
    ON INV.ProductID = TH.ProductID
    ON TH.ProductID = P.ProductID
    ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
    P.ProductID,
    P.Name,
    PS.ProductSubcategoryID,
    PS.Name;

In which case the logical tree at the same point is:

Input tree 2

The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:

Plan shape

There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:

Plan 2

The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.

In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.

To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)

Sql-server – Index not making execution faster, and in some cases is slowing down the query. Why is it so

Even though the index is suggested by the SQL Server, why does it slow things down by a significant difference?

Index suggestions are made by the query optimizer. If it comes across a logical selection from a table which is not well served by an existing index, it may add a "missing index" suggestion to its output. These suggestions are opportunistic; they are not based on a full analysis of the query, and do not take account of wider considerations. At best, they are an indication that more helpful indexing may be possible, and a skilled DBA should take a look.

The other thing to say about missing index suggestions is that they are based on the optimizer's costing model, and the optimizer estimates by how much the suggested index might reduce the estimated cost of the query. The key words here are "model" and "estimates". The query optimizer knows little about your hardware configuration or other system configuration options - its model is largely based on fixed numbers that happen to produce reasonable plan outcomes for most people on most systems most of the time. Aside from issues with the exact cost numbers used, the results are always estimates - and estimates can be wrong.

What is the Nested Loop join which is taking most of the time and how to improve its execution time?

There is little to be done to improve the performance of the cross join operation itself; nested loops is the only physical implementation possible for a cross join. The table spool on the inner side of the join is an optimization to avoid rescanning the inner side for each outer row. Whether this is a useful performance optimization depends on various factors, but in my tests the query is better off without it. Again, this is a consequence of using a cost model - my CPU and memory system likely has different performance characteristics than yours. There is no specific query hint to avoid the table spool, but there is an undocumented trace flag (8690) that you can use to test execution performance with and without the spool. If this were a real production system problem, the plan without the spool could be forced using a plan guide based on the plan produced with TF 8690 enabled. Using undocumented trace flags in production is not advised because the installation becomes technically unsupported and trace flags can have undesirable side-effects.

Is there something that I am doing wrong or have missed?

The main thing you are missing is that although the plan using the nonclustered index has a lower estimated cost according to the optimizer's model, it has a significant execution-time problem. If you look at the distribution of rows across threads in the plan using the Clustered Index, you will likely see a reasonably good distribution:

Scan plan

In the plan using the Nonclustered Index Seek, the work ends up being performed entirely by one thread:

Seek plan

This is a consequence of the way work is distributed among threads by parallel scan/seek operations. It is not always the case that a parallel scan will distribute work better than an index seek - but it does in this case. More complex plans might include repartitioning exchanges to redistribute work across threads. This plan has no such exchanges, so once rows are assigned to a thread, all related work is performed on that same thread. If you look at the work distribution for the other operators in the execution plan, you will see that all work is performed by the same thread as shown for the index seek.

There are no query hints to affect row distribution among threads, the important thing is to be aware of the possibility and to be able to read enough detail in the execution plan to determine when it is causing a problem.

With the default index (on primary key only) why does it take less time, and with the non clustered index present, for each row in the joining table, the joined table row should be found quicker, because join is on Name column on which the index has been created. This is reflected in the query execution plan and Index Seek cost is less when IndexA is active, but why still slower? Also what is in the Nested Loop left outer join that is causing the slowdown?

It should now be clear that the nonclustered index plan is potentially more efficient, as you would expect; it is just poor distribution of work across threads at execution time that accounts for the performance issue.

For the sake of completing the example and illustrating some of the things I have mentioned, one way to get a better work distribution is to use a temporary table to drive parallel execution:

SELECT
    val1,
    val2
INTO #Temp
FROM dbo.IndexTestTable AS ITT
WHERE Name = N'Name1';

SELECT 
    N'Name1',
    SUM(T.val1),
    SUM(T.val2),
    MIN(I2.Name),
    SUM(I2.val1),
    SUM(I2.val2)
FROM   #Temp AS T
CROSS JOIN IndexTestTable I2
WHERE
    I2.Name = 'Name1'
OPTION (FORCE ORDER, QUERYTRACEON 8690);

DROP TABLE #Temp;

This results in a plan that uses the more efficient index seeks, does not feature a table spool, and distributes work across threads well:

Optimal plan

On my system, this plan executes significantly faster than the Clustered Index Scan version.

If you're interested in learning more about the internals of parallel query execution, you might like to watch my PASS Summit 2013 session recording.

Best Answer

Related Solutions

Sql-server – join syntax / style performance consideration

Sql-server – Index not making execution faster, and in some cases is slowing down the query. Why is it so

Related Question