I'm tuning a query and have discovered some behaviour I'm not clear about.
If I remove the WHERE IN
clause the query runs in 3 seconds instead of 3 minutes.
There only 7 rows returned in the result since there are only 7 items in the IN
clause.
Thinking this was a bit odd, I started trying to get both parts of the query behaving individually, however no matter what I've tried the IN
clause causes the long execution time.
Looking at the execution plan I can see that it's taking some joins and converting them from Nested Loops
to Hash Match
.
I've diced up the SQL so the single SELECT
is now split between a View
, CTE
and SELECT
. I would have expected the IN
clause to not impact what is happening inside of the View
, but it is.
The execution plans are: with the IN
clause and without the IN
clause. Please be gentle regarding the quality of the query; it's a work in progress on an inherited project and I'm by no means a proper DBA.
Selecting all 1.8 million rows from the View
takes ~1 minute; executing the CTE
by itself takes ~3 seconds; executing the full query takes ~3 minutes.
How can I convince SQL Server to maintain the Nested Loops
join instead of switching to Hash Match
when I add in the IN
clause? Or is there something else that I should be trying?
Best Answer
Well, you can force nested loops by using OPTION(LOOP JOIN) however, I would advise using OPTION(LOOP JOIN, MERGE JOIN) this is working with the optimiser rather than against it and is saying do what you want as long as its not a hash match.
But I would also dig a little deeper. With the IN clause it is possible that SQL is performing multiple index scans as opposed to a single table scan, so you may need to use an WITH(INDEX(0)) instead.
If you add SET STATISTICS IO at the top of the batch and run it with and then without the IN clause and check the number of physical and logical reads for each, as well as the read-ahead reads, this will prove or disprove my theory.