Sql-server – Is left hash join always better than left outer join

hashingsql server

I have a query that runs quite slow (see below). While searching for a way to improve it we found that the query runs about ten times faster if we replaced the LEFT OUTER JOIN with LEFT HASH JOIN

The result seems to be the same. Is it? In what cases wont it return the same result? What difference are there? Is there something that i should look out for when running a LEFT HASH JOIN instead of a LEFT OUTER JOIN?

The [ABC].[ExternalTable] table in the query is a view on a different server that i have added as a external table

SELECT t.foo, t.bar, t.data
FROM [dbo].[Table] as t
LEFT OUTER JOIN [ABC].[ExternalTable] as s ON s.foo = t.foo and s.bar = t.bar and s.data = t.data
WHERE s.foo is null and s.bar IS NULL and s.data IS NULL

Best Answer

Semantically, both queries are the same. The LOOP versus the HASH simply tells SQL Server which option to use to return results. If you run the query without LOOP or HASH, SQL Server may pick either of those options, depending on which it thinks will perform best. Both options return identical results by design. To put it slightly differently1, the keywords HASH and LOOP in this case are what is known as join hints, one of the three kinds of hints in Transact-SQL. To confuse you even more, you can also specify a join hint as a query hint, though the effect is not necessarily the same in both cases. Be aware that providing these kinds of join hints implies a FORCE ORDER hint, which specifies that the join order indicated by the query syntax is preserved during query optimization (see Paul's answer below for further details).

SQL Server uses statistics on each table in the query to make an informed choice about what kind of physical operation to take with for each JOIN in the T-SQL query statement.

In this case, since [ExternalTable] is a view referenced through a linked server, SQL Server probably expects there to be 1 row in the table - i.e. it has no idea how many rows to expect.

You can either add a join hint to your query to force a merge join, or simply copy rows from [ExternalTable] into a local #temp table with a clustered index, then run the query against that.

The full syntax for the hash join would be:

LEFT OUTER HASH JOIN [ABC].[ExternalTable] s ON s.foot = t.foo .....

The version with the HASH in the join statement is allowing SQL Server to choose the join type, and in your particular case it's most likely choosing a LOOP join, which you could force with:

LEFT OUTER LOOP JOIN [ABC].[ExternalTable] s ON s.foot = t.foo .....

I typically would not recommend specifying the join type since most of the time SQL Server is very capable of choosing the most appropriate style of join operator.


1 - thanks Andriy for the wording here.