Sql-server – On the query planners’s reasons for choosing merge joins vs nested loops

execution-plansql serversql server 2014

I encountered a real-life version of this simplified sample on how the execution planner deals with a join. The schema and data is:

CREATE TABLE Fact (
    id INT not null,
    value1 INT not null,
    CONSTRAINT PK_Fact PRIMARY KEY CLUSTERED (id)
    );

CREATE TABLE Property (
    id INT not null,
    value2 INT not null,
    CONSTRAINT PK_Property PRIMARY KEY CLUSTERED (id)
);

DECLARE @batchsize INT = 10000;
DECLARE @i INT = 1;

WHILE @i < @batchsize
BEGIN
    INSERT INTO Fact VALUES (@i, @i % 1000);
    INSERT INTO Property VALUES (@i, @i);
    SET @i = @i + 1;
END

The we execute this query:

SELECT *
FROM Fact f
JOIN Property p ON f.id = p.id
WHERE f.value1 = 1

The execution plan makes this a nested loop:

If we reduce the the possible value count in Value1 to a hundred, by modifying a re-creating the schema and data like so:

WHILE @i < @batchsize
BEGIN
    INSERT INTO Fact VALUES (@i, @i % 100);
    INSERT INTO Property VALUES (@i, @i);
    SET @i = @i + 1;
END

… and re-run the query, it becomes a merge:

I have a real-life case where I believe the execution planner makes the wrong choice, as I can significantly boost performance by first selecting all the ids alone, without a join, and then fetch the actual rows with a second query.

I'm wondering:

Are there additional ways to ask Sql Server as to why exactly he chooses one plan or the other in cases like this? The general notion of using the number of possible of values is plausible, but can more exact information be retrieved? In particular, is the number of possible values the only information going in to the execution planner's reasoning, or is it more sophisticated than that? Does the average row size of the joined table matter?
Is there a way to hint Sql Server to do one or the other in cases where testing shows that Sql Server makes a bad guess?

Best Answer

In your sample case SQL Server made very good choice.
Here is what it was actually doing:
- In the first set of data it extracts 10 rows from Fact table based on the value column. Then it extracts 10 corresponding IDs from Property table using kind of SEEK operation for each ID.
- In the second scenario, when there are 100 matching rows in Fact table SQL decided not to do SEEK operation, but use SCAN instead, which is much cheaper from the I/O perspective.

Yes, you can use hints for your queries (https://msdn.microsoft.com/en-us/library/ms181714.aspx):

OPTION (LOOP JOIN)
OPTION (MERGE JOIN)

Try both of them for both of your data sets to see the difference in query cost and amount of I/O using SET STATISTICS IO ON.

For instance, when you force LOOP JOIN for the second data set I/O for Property table jumps 24 to 215 reads.
So, be VERY CAREFUL using any kind of these hints.

Related Solutions

Sql-server – Oracle GoldenGate add trandata errors

I found out what the problem is, it seems that GoldenGate doesn't work with SQL Express. The server I was connecting to is SQL Express, I'll need to use the Enterprise Edition.

Sql-server – deteriorating stored procedure running times

What is up with FROM part JOIN model ON 1=1? This the same as FROM part, model, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?

You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.

This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.

You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using sp_executesql, as in the following example:

CREATE PROCEDURE [dbo].[create_grid_materials2] 
(
    @partlistid bigint
    , @pid bigint
    , @masterid bigint
)
AS
BEGIN
    begin
        DECLARE @cmd NVARCHAR(MAX);

        SET @cmd = '   
        INSERT INTO material (partid, personid, modelID)
        SELECT 
            partid = part.id
            , personid = @pid
            , modelid = model.id  
        FROM part
            INNER JOIN model ON 1=1
        WHERE (
            model.masterid = ' + CONVERT(NVARCHAR(50), @masterid) + ' 
                AND model.modelSetID IS NULL
                AND part.partlistid = ' + CONVERT(NVARCHAR(50), @partlistid) + '
                AND (
                    part.partType = 100 
                    or part.partType=120 
                    or part.partType = 130
                )
            )
            AND NOT EXISTS (
                SELECT 1 
                FROM material AS a1 
                WHERE a1.partid = part.id 
                    AND a1.personid=@pid 
                    AND a1.modelid=model.id
                )';
        DECLARE @Params VARCHAR(200);
        SET @Params = '@pid INT';
        EXEC sys.sp_executesql @cmd
            , @Params
            , @pid = @pid;
    end
End

The above code will cause a new plan to be generated for each combination of @partlistid, and @masterid.

The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.

Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included @pid since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.

Best Answer

Related Solutions

Sql-server – Oracle GoldenGate add trandata errors

Sql-server – deteriorating stored procedure running times

Related Question