Sql-server – Should I use a subquery to help SQL Server find the correct plan

execution-planperformancequery-performancesql serversubquery

I have a query (which has components which are built on the fly depending on the selections a user makes in the interface) that runs in SQL Server 2008-R2 that runs in about 100 databases. Users are company employees and each company gets their own database, and both usage patterns and data ratio's between tables vary greatly between companies. This is a query that directly effects the user's perception of the speed of the system, and almost all users need to run this query every time they use the system, so it is worth my time to get it right.

We've mostly had good luck letting SQL Server choose the plan (since the best plan probably varies on different databases at different times), but as we've expanded to more users and added more databases (and database servers), we have found cases where it simply has decided on the wrong plan, and the performance goes from a half second (which, at this point, is faster than the user interface) to 10 seconds or worse (which is effectively broken as far as the users are concerned). I've finally been able to isolate a situation where the performance is consistently bad, so I can now try to fix the problem.

Here is what we currently have (simplified considerably, of course):

SELECT
    -- about 100 columns
FROM
    base_table
LEFT JOIN
    limiting_table
INNER JOIN
    problem_table
LEFT JOIN
    tables_only_used_in_select
INNER JOIN CONTAINSTABLE(
    base_table, *, 'extended text to search for', LANGUAGE 1033) AS rank1 
ON 
    base_table.row_id = rank1.[KEY]
LEFT OUTER JOIN CONTAINSTABLE(
    base_table, *, 'text to search for', LANGUAGE 1033) AS rank2 
ON 
    base_table.row_id = rank2.[KEY]
WHERE
    various criteria on base table

The "problem_table" in the above query seems to be the key. The vast majority of the time, this has no affect on the rows returned by the query. My first fix was to change it from an INNER JOIN to a LEFT JOIN. That fixed the problem. Of course, now I'm going to have to make sure to limit the results a different way (which is possible, but is not an easy code change to get that information to the query), and all the joins still give the database a chance to find another slow plan.

However, I've also had good luck trying to isolate the "important" joins into a subquery, such that the plan won't get distracted by the other tables that aren't as critical. Like this:

SELECT
    -- about 100 columns
FROM (
    SELECT base_table.*
    FROM 
        base_table
    LEFT JOIN
        limiting_table
    INNER JOIN CONTAINSTABLE(
        base_table, *, 'extended text to search for', LANGUAGE 1033) AS rank1 
    ON 
        base_table.row_id = rank1.[KEY]
    LEFT OUTER JOIN CONTAINSTABLE(
        base_table, *, 'text to search for', LANGUAGE 1033) AS rank2 
    ON 
        base_table.row_id = rank2.[KEY]
    WHERE
        various criteria on base table
    ) base_table
INNER JOIN
    problem_table
LEFT JOIN
    tables_only_used_in_select

Am I looking at problems doing it this way that I just don't have enough experience to know better? Or is this a reasonable practical strategy in this case (maybe not ideal, perhaps, but worthwhile trying it and seeing if it works)?

UPDATE:

Here is the plan for the original query that runs on one database in 1 second (and returns 46 rows):

Good Plan

And here is the exact same query running in a different database that takes 55 seconds and returns 369 rows:

Slow Plan

Changing from INNER JOIN to LEFT JOIN on the problem_table runs in under a second and results in this plan:

LEFT JOIN problem_table

And the subquery, which also takes one second, results in this plan:

enter image description here

Note: I really thought I saw something about the correct way to post plans, but I can't seem to find it. If someone comments with a hint, I'll try to do a better job.

Best Answer

For future readers, I went ahead and tested both methods (with and without the subquery), and they both worked equally well on all tested environments. I do not know if the database statistics needed updated on the databases that originally had a problem or not, but like many developers, I am not in charge of that and don't have any ability to alter it.

From what I can see, the advantage of using LEFT JOIN in this case is that it removes the attraction for the plan builder to consider this a good thing (which it isn't, in this case), while still letting the plan builder choose from all the other choices. The advantage for the subquery is that you still get the INNER JOIN ability to remove any (rare) invalid rows, but you do not let the plan builder include that fact in the critical plan (because the INNER JOIN is in the outside query, not the subquery). In essence, you are restricting what joins the plan builder can use by placing them in the subquery, and doing the other joins in the main query.

Of course, anytime you restrict the plan builder, you are taking a chance that you restrict it too far and it comes up with a non-optimal plan. Going forward, I won't necessarily use either technique until I find I need to. Note that I also tried to use hints or explicitly tell SQL Server to use a particular type of join, but those were never as fast as when I let it choose everything itself.

In the end, it turned out that, due to the ugliness of the code that built this query (on the fly, even), the LEFT JOIN was far easier to implement and have some confidence that I wouldn't break anything, so I went that way. But that decision was made by factors external to the database.