Sql-server – left outer join – sort operations in the query plan – any ways of tuning this simple query

execution-planoptimizationperformancequery-performancesort-operatorsql server

while working on the query below in order to answer this question:

How to query chart data in a database agnostic way?

Having the following tables:

CREATE TABLE [dbo].[#foo] ( 
[creation]  DATETIME                         NOT NULL,
[value]     MONEY                                NULL,
[DT]        AS (CONVERT([date],[CREATION])) PERSISTED)


-- add a clustered index on the dt column
CREATE CLUSTERED INDEX CI_FOO ON #FOO(DT)
GO

and this other table for joining:

create table #bar (dt date primary key clustered)
go

the loading of data into these tables can be found here.

But when running the following query:

WITH RADHE AS (
SELECT THE_ROW=ROW_NUMBER() OVER(PARTITION BY B.DT ORDER BY B.DT),
       THE_DATE=B.dt,
       THE_NUMBER_OF_RECORDS_ON_THIS_DAY=CASE WHEN F.DT IS NULL THEN 0 ELSE COUNT(*) OVER (PARTITION BY F.DT ) END ,
       THE_TOTAL_VALUE_FOR_THE_DAY=COALESCE(SUM(F.VALUE) OVER (PARTITION BY b.DT ),0)

FROM #BAR B
LEFT OUTER JOIN #FOO F
ON B.dt = F.dt
)

--get rid of the duplicates and present the result
SELECT 
THE_DATE,
THE_NUMBER_OF_RECORDS_ON_THIS_DAY,
THE_TOTAL_VALUE_FOR_THE_DAY
FROM RADHE
WHERE THE_ROW = 1

I get something like this picture below, which is exactly what I was looking for.

enter image description here

But the execution plan generated has several Sort and Nested Loops Operations, as you can see on the picture below.

The full query plan can be found here.

enter image description here

this is a very simple operation, a left outer join between 2 tables, the indexes are already ordered, and therefore I was wondering if I could simplify the query plan.

alternatively, I could change the query code.

why exactly do we need nested loops 2 times and sort 2 times in the query plan?

Best Answer

You have an index that provides ordering by B.DT but

  • the plan first evaluates THE_ROW using this order
  • then the right hand sort orders by F.DT to evaluate THE_NUMBER_OF_RECORDS_ON_THIS_DAY
  • and finally the left hand sort puts things back into B.DT order for the THE_TOTAL_VALUE_FOR_THE_DAY.

You can get rid of one of the sorts by simply changing the order of the columns in the CTE so the F.DT one appears last (The connect item for this Unnecessary Sort is here)

WITH RADHE AS (
SELECT THE_ROW=ROW_NUMBER() OVER(PARTITION BY B.DT ORDER BY B.DT),
       THE_DATE=B.dt ,
       THE_TOTAL_VALUE_FOR_THE_DAY=COALESCE(SUM(F.VALUE) OVER (PARTITION BY b.DT ),0),
       THE_NUMBER_OF_RECORDS_ON_THIS_DAY=CASE WHEN F.DT IS NULL THEN 0 ELSE COUNT(*) OVER (PARTITION BY F.DT ) END

FROM #BAR B
LEFT OUTER JOIN #FOO F
ON B.dt = F.dt
)

But you can get rid of both by changing the definition of THE_NUMBER_OF_RECORDS_ON_THIS_DAY to

CASE WHEN F.DT IS NULL THEN 0 ELSE COUNT(*) OVER (PARTITION BY B.DT ) END

So it uses the same partitioning definition as the rest of the functions.

This shouldn't change anything in your example as your CASE expression will just assign 0 to any non matched rows anyway.

As for the rest of the plan see Partitioning and the Common Subexpression Spool

(Plan afterwards with no sorts)

enter image description here