Sql-server – Hash keys probe and residual

database-internalsexecution-planperformancequery-performancesql server

Say, we have a query like this:

select a.*,b.*
from 
a join b
on a.col1=b.col1
and len(a.col1)=10

Assuming the above query uses a Hash Join and has a residual, the probe key will be col1 and the residual will be len(a.col1)=10.

But while going through another example, I could see both the probe and the residual to be the same column. Below is an elaboration on what I am trying to say:

Query:

select *
from T1 join T2 on T1.a = T2.a 

Execution plan, with probe and residual highlighted:

enter image description here

Test data:

create table T1 (a int, b int, x char(200))
create table T2 (a int, b int, x char(200))

set nocount on
declare @i int
set @i = 0
while @i < 1000
  begin
      insert T1 values (@i * 2, @i * 5, @i)
    set @i = @i + 1
  end

declare @i int
set @i = 0
while @i < 10000
  begin
    insert T2 values (@i * 3, @i * 7, @i)
    set @i = @i + 1
  end

Question:

How can a probe and a residual be the same column? Why can't SQL Server use the probe column only? Why does it have to use the same column as a residual to filter rows again?

References for test data:

Best Answer

If the join is on a single column typed as tinyint, smallint, or integer* and if both columns are constrained to be NOT NULL, the hash function is 'perfect' – meaning there is no chance of a hash collision, and the query processor does not have to check the values again to ensure they really match.

Otherwise, you will see a residual as items in the hash bucket are tested for a match, not just a hash function match.

Your test does not specify NULL or NOT NULL for the columns (a bad practice, by the way), so it appears you are using a database where NULL is the default.

More information in my post Join Performance, Implicit Conversions, and Residuals and Hash Join Execution Internals by Dmitry Pilugin.


* Other qualifying types are bit, smalldatetime, smallmoney, and (var)char(n) for n = 1 and binary collation