SQL Server 2014 – Optimize Self-Join on Primary Key

optimizationsql serversql server 2014

Consider this query which consists of N self-joins:

select
    t1.*
from [Table] as t1
join [Table] as t2 on
    t1.Id = t2.Id
-- ...
join [Table] as tN on
    t1.Id = tN.Id

It produces an execution plan with N clustered index scans and N-1 merge joins.

Honestly, I don't see any reasons to not optimize away all joins and do just one clustered index scan, i.e. optimize original query to this:

select
    t1.*
from [Table] as t1

Questions

  • Why joins aren't optimized away?
  • Is it mathematically incorrect to say that every join doesn't change result set?

Tested on:

  • Source Server Version : SQL Server 2014 (12.0.4213)
  • Source Database Engine Edition : Microsoft SQL Server Standard Edition
  • Source Database Engine Type : Standalone SQL Server
  • Compatibility level : SQL Server 2008 (100)

The query isn't meaningful; it just came to my mind and I'm curious about it now.

Here's the fiddle with table creation and 3 queries: with inner join's, with left join's and mixed. You can also look at execution plan there, too.

It seems that left joins are eliminated in the result execution plan while inner joins are not. Still don't get why, though.

Best Answer

First, lets assume that (id) is the primary key of the table. In this case, yes, the joins are (can be proved) redundant and could be eliminated.

Now that's just theory - or mathematics. In order for the optimizer to do an actual elimination, the theory has to have been converted into code and added in the optimizer's suite of optimizations/rewritings/eliminations. For that to happen, the (DBMS) developers must think that it will have good benefits to efficiency and that it's a common enough case.

Personally, it doesn't sound like one (common enough). The query - as you admit - looks rather silly and a reviewer shouldn't let it pass review, unless it was improved and the redundant join removed.

That said, there are similar queries where the elimination does happen. There is a very nice related blog post by Rob Farley: JOIN simplification in SQL Server.

In our case, all we have to do in change the joins to LEFT joins. See dbfiddle.uk. The optimizer in this case knows that the join can be safely removed without possibly changing the results. (The simplification logic is quite general and is not special-cased for self-joins.)

In the original query of course, removing the INNER joins cannot possibly change the results either. But it's not common at all to self-join on the primary key so the optimizer does not have this case implemented. It's common however to join (or left join) where joined column is the primary key of one of the tables (and there is often a foreign key constraint). Which leads to a second option to eliminate the joins: Add a (self referencing!) foreign key constraint:

ALTER TABLE "Table"
    ADD FOREIGN KEY (id) REFERENCES "Table" (id) ;

And voila, the joins are eliminated! (tested in the same fiddle): here

create table docs
(id int identity primary key,
 doc varchar(64)
) ;
GO
insert
into docs (doc)
values ('Enter one batch per field, don''t use ''GO''')
     , ('Fields grow as you type')
     , ('Use the [+] buttons to add more')
     , ('See examples below for advanced usage')
  ;
GO
4 rows affected
--------------------------------------------------------------------------------
-- Or use XML to see the visual representation, thanks to Justin Pealing and
-- his library: https://github.com/JustinPealing/html-query-plan
--------------------------------------------------------------------------------
set statistics xml on;
select d1.* from docs d1 
    join docs d2 on d2.id=d1.id
    join docs d3 on d3.id=d1.id
    join docs d4 on d4.id=d1.id;
set statistics xml off;
GO
id | doc                                      
-: | :----------------------------------------
 1 | Enter one batch per field, don't use 'GO'
 2 | Fields grow as you type                  
 3 | Use the [+] buttons to add more          
 4 | See examples below for advanced usage    

enter image description here

--------------------------------------------------------------------------------
-- Or use XML to see the visual representation, thanks to Justin Pealing and
-- his library: https://github.com/JustinPealing/html-query-plan
--------------------------------------------------------------------------------
set statistics xml on;
select d1.* from docs d1 
    left join docs d2 on d2.id=d1.id
    left join docs d3 on d3.id=d1.id
    left join docs d4 on d4.id=d1.id;
set statistics xml off;
GO
id | doc                                      
-: | :----------------------------------------
 1 | Enter one batch per field, don't use 'GO'
 2 | Fields grow as you type                  
 3 | Use the [+] buttons to add more          
 4 | See examples below for advanced usage    

enter image description here

alter table docs
  add foreign key (id) references docs (id) ;
GO
--------------------------------------------------------------------------------
-- Or use XML to see the visual representation, thanks to Justin Pealing and
-- his library: https://github.com/JustinPealing/html-query-plan
--------------------------------------------------------------------------------
set statistics xml on;
select d1.* from docs d1 
    join docs d2 on d2.id=d1.id
    join docs d3 on d3.id=d1.id
    join docs d4 on d4.id=d1.id;
set statistics xml off;
GO
id | doc                                      
-: | :----------------------------------------
 1 | Enter one batch per field, don't use 'GO'
 2 | Fields grow as you type                  
 3 | Use the [+] buttons to add more          
 4 | See examples below for advanced usage    

enter image description here