SQL Server Performance – Why is Left Join Faster Than Inner Join?

performanceperformance-tuningsql server

So I'm tuning this query, and I am pretty sure that in this instance, I can replace an inner join with a left join without affecting the data. However, I'm not entirely sure why this is faster. Here is the query:

SELECT DISTINCT  cl.NAME                            AS company_name,
                     cl.id                              AS company_id,
                     ep.Plan__c                           AS plan_id,
                     ep.Employee__c,
                     ep.id,
                     do.Subcategory__c,
                     ep.Plan_Type__c,
                     pt.SubType,
                     Sum((pt.[shares] * fvh.[ValuePerShare])) AS TotalValue,
                     ppe.Deferral_Option__c,
                     dt.Defer_type_Code,
                     do.Short_Code__c,
                     pt.ContributionYear
  FROM   dbo.ParticipantTrades pt WITH (NOLOCK) 
        INNER JOIN dbo.PayoutPathElection ppe WITH (NOLOCK) 
                ON pt.payoutPathElectionID = ppe.Id 
        INNER JOIN dbo.DeferralOption do WITH (NOLOCK) 
                ON ppe.Deferral_Option__c = do.id 
        INNER JOIN dbo.EmployeePlan ep WITH (NOLOCK) 
                ON pt.employeePlan = ep.Id 
        LEFT JOIN dbo.DeferralType dt WITH (NOLOCK) 
                ON pt.deferralType = dt.defID
        INNER JOIN dbo.Fnc_lastfundvalue('2019-01-30') AS fvh 
                ON pt.fund = fvh.Fund
        INNER JOIN dbo.Clients cl with (NOLOCK)
                ON ep.Company__c = cl.Id
  WHERE ep.Company__c = '0017000001WL1HfAAL' AND ep.Plan_Type__c  LIKE '%' AND pt.tradeDate <= '2019-01-30'
  Group by   cl.NAME,
             cl.id,
             ep.Plan__c,
             ep.Employee__c,
             ep.id,
             do.Subcategory__c,
             ep.Plan_Type__c,
             pt.SubType,
             ppe.Deferral_Option__c,
             dt.Defer_type_Code,
             do.Short_Code__c,
             pt.ContributionYear

The bottleneck is at the table-value function join (Fnc_lastfundvalue). My hunch to why changing it to a left join is faster is that it can then reorder the joins and it causes less spillage into tempdb? Here is the query plan before and after changing INNER JOIN dbo.Fnc_lastfundvalue.. to LEFT JOIN dbo.Fnc_lastfundvalue..

Before (29 seconds): https://www.brentozar.com/pastetheplan/?id=Bk1Y-WqEE

After(3 seconds): https://www.brentozar.com/pastetheplan/?id=B1hoW-544

NB: Execution plans above are created on a dev box. The production server is still on SQL Server 2008.

Best Answer

It looks as though some of the difference is because the slow plan was run first.

The slow plan was clearly operating against a cold cache as it shows additional physical reads and managed to accumulate an additional 15 seconds of PAGEIOLATCH_SH waits compared to the fast case.

This looks as though it affected both the execution of the TVF itself (which took 5.5 seconds compared with 1.35 in the fast case.) and the wider plan using the result of it.

You say in the comments that on second execution it took 11 seconds. This is still 3 times slower than the fast plan (3.446 seconds) so doesn't explain all the performance difference.

The main problem you are experiencing is due to the poor default estimation for multi statement TVFs (fixed guess of 100 rows in compatibility levels 2014/2016 and 1 in earlier versions). In reality your TVF returns 1,715 rows.

In the inner join case because inner joins are associative and commutative it can be re-ordered flexibly and gets moved to the deepest part of the tree. This would make sense if one row was actually returned as it could whittle down the row count early for the other joins in the plan.

The execution plan is below. The pink annotations are "Actual Elapsed Time (ms)" from the XML.

enter image description here

Because of the 1 row estimate it starts off badly by joining onto dbo.ParticipantTrades and selecting a plan with nested loops and lookups. That nested loops has an elapsed time of 13.976 seconds (presumably most of which was spent waiting on the spilling sort immediately upstream to request rows from it, with 3.26 seconds taken up on the lookups themselves and associated IO waits for the physical reads at that operator).

923,646 rows are emitted from the join vs an estimated 1423.18 and this mis-estimate propagates upwards in the plan through 3 sorts and a hash join spilling as it goes (due to the row underestimate)

When you add the LEFT JOIN it can not be as freely re-ordered and the join happens much higher up in the plan (where it would do less damage in the problematic INNER JOIN case). The semantics of outer join help here anyway.

Whilst the estimate for the number of rows coming out of the TVF is still 1 this does not adversely affect the estimates for the join it is directly involved in (SQL Server assumes that the cardinality coming out of that join will be the same as that of the other sub tree to the join and this is in fact what happens - an outer join cannot reduce this number as a non joined row would still pass through - just with NULL for the fvh columns).

There are still cardinality estimation errors in the outer join plan but not of the same magnitude and it requests a sufficient memory grant to avoid spilling anywhere.

This cardinality estimation issue has been resolved in the most recent version with interleaved execution. In the meantime (as you say in the comments it needs to be 2008 compatible) you can manually interleave it by storing the TVF result into a #temp table and joining onto that to allow the count of the intermediate result (and column statistics) to be taken into account.