Sql-server – SQL Server chooses Nested Loop join with dimensional table and make seek for each row

execution-planjoin;sql serversql-server-2017

I face an issue where SQL Server generates a un-optimal execution plan: a
Nested Loop join and seek to the dimensional table and executes 2M reads on it.

The Sort operation Estimation is 100 rows instead of 450 K rows and maybe affects the plan selection:

NestedLoop: https://www.brentozar.com/pastetheplan/?id=B110MZ2Pm or
NestedLoop plan

This is within a test DB. We have an additional DB with the same schema and almost the same data.

Running exactly the same query (both from SSMS) generates a different plan using Hash Join and dimensional table scan (32K reads):

HashJoin: https://www.brentozar.com/pastetheplan/?id=r1Jm7b2D7 or
Hash plan

I need a help to understand and solve the issue.

I can work around it by hint Hash Joint, but it does not make any sense that 2 similar DBs on the same instance generate different plans.

Update #1: I found that the estimated cost is different
So when SQL Server executes parallel it will choose a hash join.

With a single thread will be nested loop.

Update #2: The same issue occurred during a SELECT from the same table.
Depend on the number of columns (estimated cost). When I reduce the number of columns, the execution plan falls to a nested loop and seeks the dimension table.

Best Answer

There appear to be three reasons why you get a serial nested loop join plan in one of your environments and a hash join in the other. Based on the information that you've provided, the best fixes involve query hints or splitting the query into two parts.

  1. Differences between your environments

    One environment has 480662 rows in your CCI and the other has 686053 rows. I wouldn't call that nearly identical. There also appears to be a hardware or configuration difference between your environments, or at the very least you're getting very unlucky. The serial sort of 251 MB of estimated data has an IO cost of 0.0037538 units. The parallel sort of 351 MB of estimated data has an IO cost of 23.1377 units, even though it is discounted by parallelism. The engine expects to spill a relatively significant amount of data for the parallel plan. Differences like that can lead to different plans between environments.

  2. The optimizer misapplies a row goal cost reduction which can favor a nested loop join plan

    The nested loop plan is costed as if only 100 rows need to be output from the sort:

    bad row goal

    However, the query contains the following in the SELECT clause: COUNT(*) OVER ()

    The engine must read all rows in order to produce the correct result for the aggregate. This is indeed what happens in the actual plan, and the index seek is executed 450k times instead of 100 times. This cost reduction appears to happen on a variety of versions (I tested back to 2016 SP1 base), on both CEs, with many different window functions, and for both batch mode and row mode. It is a limitation in the product which results in a suboptimal query plan here.

  3. The nested loop join plan is serial due to a limitation with batch mode sorts

    It's possible that your serial nested loop join qualifies for parallelism (depends on your CTFP) and you may be wondering why the optimizer did not find a lower costed parallel plan. The optimizer has heuristics which prevent a parallel batch mode sort from being the first child of a nested loop join (which must run in row mode). The problem is that the parallel batch mode sort will put all rows on a single thread which won't work well with a parallel nested loop join. Moving the sort to be a parent of the loop join won't result in a decrease in the number of estimated executions for the index seek (due to the optimizer problem). As a result you're very likely to end up with a serial plan, even if CTFP was set to the default of 5.


Here is a reproduction of your issue, which I can't upload to PasteThePlan because it doesn't support my version of SQL Server:

drop table if exists cci_216665;

create table cci_216665 (
    SORT_ID BIGINT,
    JOIN_ID BIGINT,
    COL1 BIGINT,
    COL2 BIGINT,
    COL3 BIGINT,
    INDEX CCI CLUSTERED COLUMNSTORE
);

INSERT INTO cci_216665 WITH (TABLOCK)
SELECT TOP (500000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 50
, 0, 0, 0
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
OPTION (MAXDOP 1);

drop table if exists YEAH_NAH;

CREATE TABLE dbo.YEAH_NAH (ID INT, FILLER VARCHAR(20), PRIMARY KEY (ID));

INSERT INTO dbo.YEAH_NAH WITH (TABLOCK)
SELECT TOP (50) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
, 'CHILLY BIN'
FROM master..spt_values t1;

GO

-- takes 780 ms of CPU with nested loops
SELECT TOP (100)
*, COUNT(*) OVER ()
FROM cci_216665 c
INNER JOIN YEAH_NAH y ON c.JOIN_ID = y.ID
ORDER BY SORT_ID;

-- takes 111 ms of CPU with hash join
SELECT TOP (100)
*, COUNT(*) OVER ()
FROM cci_216665 c
INNER JOIN YEAH_NAH y ON c.JOIN_ID = y.ID
ORDER BY SORT_ID
OPTION (HASH JOIN);

The most straightforward way to solve your problem is to split your queries in two. Here's one way to do it:

SELECT COUNT(*)
FROM cci_216665 c
INNER JOIN YEAH_NAH y ON c.JOIN_ID = y.ID;

SELECT TOP (100) *
FROM cci_216665 c
INNER JOIN YEAH_NAH y ON c.JOIN_ID = y.ID
ORDER BY SORT_ID;

On my machine this is actually faster than the hash join plan but you may not see the same results. In general, my first attempt for a query like yours would be to avoid a window aggregate without an OVER clause when only the first 100 rows are needed.

A reasonable alternative is to use the DISABLE_OPTIMIZER_ROWGOAL use hint introduced in SQL Server 2016 SP1. For this type of query, there's a problem with row goals so this hint directly addresses the problem without any dependence on statistics or anything like that. I would consider it to be a relatively safe hint to employ.

SELECT TOP (100)
*, COUNT(*) OVER ()
FROM cci_216665 c
INNER JOIN YEAH_NAH y ON c.JOIN_ID = y.ID
ORDER BY SORT_ID
OPTION (USE HINT('DISABLE_OPTIMIZER_ROWGOAL'));

This results in a hash join plan on my machine.