I have a very complex query (who hasn't :-)). In this query is a JOIN to a table like this one (of course the original has some more columns):
CREATE TABLE dbo.tbl_detail (id bigint PRIMARY key,
descr varchar(300),
txt varchar(MAX),
descr_p varchar(300)
);
There are two joins to this table in the query:
LEFT JOIN dbo.tbl_detail td1 ON td1.id = main.detail_id
LEFT JOIN dbo.tbl_detail td2 ON td2.id = main.second_id
- The query returns ~200k rows.
- td1 outputs all columns
- td2 outputs all columns except [txt] (
varchar(max)
)
Problem:
- td2 uses an HASH JOIN
- but td1 uses an NESTED LOOKUP (over 200k rows; no statistic problem, since the estimates are correct)
- when I remove the [txt] column from the td1 output it uses a HASH JOIN too
- when I alter the column to VARCHAR(5000) (or smaller) it uses a HASH JOIN
- when I alter the column to VARCHAR(8000) it uses the NESTED LOOKUP again
OPTION (HASH JOIN)
orLEFT HASH JOIN
would work, but produces an extrem slow query plan
Question:
Why does it not always use the HASH JOIN? Is there a length limit (per column or sum of all output columns)?
PS: Microsoft SQL Server 2014 (SP2) Developer Edition
Best Answer
It's important to remember that the query optimizer doesn't choose each join individually and independently from everything else going in on the query. Joins have different properties which means that a different join type may be better or worse depending on other joins or operations in the plan. I can generate test data that shows behavior similar to what you see here, but the reproduction depends on available memory for the server.
Put 10k rows each into two tables:
Here is the query to test:
If I add
d.txt_5k
to theSELECT
list I naturally get a hash join:However, if I add
d.txt_max
to theSELECT
list I naturally get a loop join:The key here is that a loop join preserves the order of the outer table. That means that a nested loop join can avoid an explicit sort for some queries. Queries with a hash join may need to do the sort. The cost of the sort depends on (among other things) the estimated size of the data. The query optimizer estimates the size of the data based on the estimated number of rows and the data types. With the
VARCHAR(5000)
column I get an estimated size of 27 MB and with theVARCHAR(MAX)
column I get an estimated size of 42 MB. On my machine the 42 MB sort is expected to spill to disk so that makes the hash join plan much more expensive than the loop plan for theVARCHAR(MAX)
column.We can see this more clearly by using hints to force the different plans so we can compare them. Here's the comparison for the
VARCHAR(5000)
query:And the comparison for the
VARCHAR(MAX)
query:Of course, this may not be why you're seeing the behavior that you're seeing for your particular query. I just wanted to provide an example of why changing a data type could change the join type. I'm going to assume that there's a problem caused by the loop join and that you really need it to be hash join. The
OPTION (HASH JOIN)
hint may not be a good workaround because it will force every join to be a hash join in the query. TheLEFT HASH JOIN
hint may not be a good workaround because it implies aFORCE ORDER
hint which means that the query optimizer will not be able to change the join order. Perhaps you can move the problematic part of the query to a temp table and apply whatever hints are needed to the smaller query. It's really hard to say much without more information.You could also try a
FORCESCAN
hint to encourage the hash join, although I'm loathe to recommend it. Be sure to test carefully: