Sql-server – How to optimize a query that’s running slow on Nested Loops (Inner Join)

performancesql server

TL;DR

Since this question keeps getting views, I'll summarize it here so newcomers don't have to suffer the history:

JOIN table t ON t.member = @value1 OR t.member = @value2 -- this is slow as hell
JOIN table t ON t.member = COALESCE(@value1, @value2)    -- this is blazing fast
-- Note that here if @value1 has a value, @value2 is NULL, and vice versa

I realize this might not be everyone's problem, but by highlighting the sensitivity of the ON clauses, it might help you look in the right direction. In any case the original text is here for future anthropologists:

Original text

Consider the following simple query (only 3 tables involved)

    SELECT

        l.sku_id AS ProductId,
        l.is_primary AS IsPrimary,
        v1.category_name AS Category1,
        v2.category_name AS Category2,
        v3.category_name AS Category3,
        v4.category_name AS Category4,
        v5.category_name AS Category5

    FROM category c4
    JOIN category_voc v4 ON v4.category_id = c4.category_id and v4.language_code = 'en'

    JOIN category c3 ON c3.category_id = c4.parent_category_id
    JOIN category_voc v3 ON v3.category_id = c3.category_id and v3.language_code = 'en'

    JOIN category c2 ON c2.category_id = c3.category_id
    JOIN category_voc v2 ON v2.category_id = c2.category_id and v2.language_code = 'en'

    JOIN category c1 ON c1.category_id = c2.parent_category_id
    JOIN category_voc v1 ON v1.category_id = c1.category_id and v1.language_code = 'en'

    LEFT OUTER JOIN category c5 ON c5.parent_category_id = c4.category_id
    LEFT OUTER JOIN category_voc v5 ON v5.category_id = c5.category_id and v5.language_code = @lang

    JOIN category_link l on l.sku_id IN (SELECT value FROM #Ids) AND
    (
        l.category_id = c4.category_id OR
        l.category_id = c5.category_id
    )

    WHERE c4.[level] = 4 AND c4.version_id = 5

This is a pretty simple query, the only confusing part is the last category join, it's this way because category level 5 might or might not exist. At the end of the query I am looking for category info per product ID (SKU ID), and the that's where the very large table category_link comes in. Finally, the table #Ids is just a temp table containing 10'000 Ids.

When executed, I get the following actual execution plan:

Actual Execution Plan

As you can see, almost 90% of the time is spent in the Nested Loops (Inner Join). Here's extra information on those Nested Loops:

Nested Loops (Inner Join)

Note that the table names don't match exactly because I edited the query table names for readability, but it's pretty easy to match (ads_alt_category = category). Is there any way to optimize this query? Also note that in production, the temp table #Ids doesn't exist, it's a Table Valued Parameter of the same 10'000 Ids passed on to the Stored Procedure.

Additional info:

  • category indices on category_id and parent_category_id
  • category_voc index on category_id, language_code
  • category_link index on sku_id, category_id

Edit (solved)

As pointed out by the accepted answer, the problem was the OR clause in the category_link JOIN. However, the code suggested in the accepted answer is very slow, slower even than the original code. A much faster and also much cleaner solution is simply to replace the current JOIN condition with the following:

JOIN category_link l on l.sku_id IN (SELECT value FROM @p1) AND l.category_id = COALESCE(c5.category_id, c4.category_id)

This minute tweak is the fastest solution, tested against the double join from the accepted answer and also tested against the CROSS APPLY as suggested by valverij.

Best Answer

The problem appears to be in this part of the code:

JOIN category_link l on l.sku_id IN (SELECT value FROM #Ids) AND
(
    l.category_id = c4.category_id OR
    l.category_id = c5.category_id
)

or in join conditions is always suspicious. One suggestion is to split this into two joins:

JOIN category_link l1 on l1.sku_id in (SELECT value FROM #Ids) and l1.category_id = cr.category_id
left outer join
category_link l1 on l2.sku_id in (SELECT value FROM #Ids) and l2.category_id = cr.category_id

You then have to modify the rest of the query to handle this . . . coalesce(l1.sku_id, l2.sku_id) for instance in the select clause.