Sql-server – Why/when does SQL Server evaluate the probe side of an inner hash join when the build side was empty

execution-planparallelismsql serversql-server-2017

Setup

DROP TABLE IF EXISTS #EmptyTable, #BigTable

CREATE TABLE #EmptyTable(A int);
CREATE TABLE #BigTable(A int);

INSERT INTO #BigTable
SELECT TOP 10000000 CRYPT_GEN_RANDOM(3)
FROM   sys.all_objects o1,
       sys.all_objects o2,
       sys.all_objects o3;

Query

WITH agg
     AS (SELECT DISTINCT a
         FROM   #BigTable)
SELECT *
FROM   #EmptyTable E
       INNER HASH JOIN agg B
                    ON B.A = E.A;

Execution Plan

enter image description here

Problem

This is a simplified repro for a phenomenon I hadn't noticed before today. My expectation for an inner hash join would be that if the build input is empty the probe side should not be executed as the join can return no rows. The above example contradicts that and reads the 10 million rows from the table. This adds 2.196 seconds to the execution time of the query (99.9%).

Additional Observations

  1. With OPTION (MAXDOP 1) the execution plan reads no rows from #BigTable. The ActualExecutions is 0 for all operators on the inside of the hash join.
  2. For the query SELECT * FROM #EmptyTable E INNER HASH JOIN #BigTable B ON B.A = E.A– I get a parallel plan, the scan operator on the inside of the hash join does have ActualExecutions of DOP but still no rows are read. This plan has no repartition streams operator (or aggregate)

Question

What's going on here? Why does the original plan exhibit the problem and the other cases don't?

Best Answer

Not running the probe side of the join when the build is empty is an optimization. It is not available for parallel row mode hash join when the probe side has a child branch i.e. when there is an exchange operator.

There was a similar report many years ago by Adam Machanic on the now-defunct Connect feedback site. The scenario was a start-up Filter on the probe side, which ran its child operators unexpectedly. The answer from Microsoft was that the engine requires a guarantee that certain structures are initialized, and the only sane way to enforce that was to ensure the probe side operators are opened.

My own recollection of the details is that not initializing a sub-tree led to hard-to-fix parallel timing bugs. Ensuring the child branch started up was a work around for those problems.

Batch mode hash join does not have this side effect because the way threads are managed is different.

In your particular case, the effect is more pronounced because the hash aggregate is blocking; it consumes its entire input during the iterator's Open() call. When there are only streaming operators on the probe side, the performance impact will often be more limited, depending on how much work is required to return the first row to the probe side of the hash join.