Sql-server – LEFT OUTER JOIN versus WHERE IS NULL

sql serversql server 2014t-sql

For some irrelevant reasons it's hard for me to get LEFT OUTER JOINS working with multiple ON clauses. Since simply moving these clauses to the WHERE statement results in different result sets, I add an additional OR IS NULL check.

LEFT OUTER JOIN ON query

SELECT t1.id, t2.ShardKey
FROM Table1 t1
LEFT OUTER JOIN Table2 t2 on t1.table2 = t2.id and t1.ShardKey = t2.ShardKey
WHERE t1.id = @id and t1.ShardKey = @shardkey

LEFT OUTER JOIN WHERE OR IS NULL query

SELECT t1.id, t2.ShardKey
FROM Table1 t1
LEFT OUTER JOIN Table2 t2 on t1.table2 = t2.id
WHERE t1.id = @id and t1.ShardKey = @shardkey
AND (t1.ShardKey = t2.ShardKey OR t2.ShardKey IS NULL)

Simplified definition for both tables:

CREATE TABLE [dbo].[Table1](
    [Id] [int] NOT NULL,
    [ShardKey] [int] NOT NULL,
    [Table2] [int] NULL,
PRIMARY KEY CLUSTERED 
(
    [Id] ASC,
    [ShardKey] ASC
))

CREATE TABLE [dbo].[Table2](
    [Id] [int] NOT NULL,
    [ShardKey] [int] NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [Id] ASC,
    [ShardKey] ASC
))

These queries seem to give exactly the same query plan in SQL Server Management Studio, but I wonder if there're situations where the one outperforms the other.

Best Answer

The LEFT OUTER JOIN ON version is the standard way to write this logic, so I would highly recommend using this version rather than fiddling around with OR for performance reasons.

However, as ypercube has helpfully pointed out, the queries are not quite equivalent if the data and sharding methodology means that a Table1.Id value can end up in a different ShardKey than the ShardKey of the corresponding Table1.table2 value. If that happens, then the OR t2.ShardKey IS NULL portion of the 2nd query will be triggered and retain a row with matching Ids even if the ShardKey does not match.

In general, I suspect that they will typically result in similar query plans. Given two equivalent queries (which yours may or may not be, depending on the sharding methodology), by formulating the query in the standard way you are more likely to be working with the query optimizer rather than forcing it to recognize that the two queries can be expressed in similar ways and apply the appropriate simplifications.

If you want to be even more explicit about applying @shardkey to each table in the join, you can do that directly in the LEFT OUTER JOIN. However, SQL Server will be able to easily recognize that your original query is equivalent to this one.

SELECT t1.id, t2.ShardKey
FROM Table1 t1
LEFT OUTER JOIN Table2 t2 on t1.table2 = t2.id and t2.ShardKey = @shardkey
WHERE t1.id = @id and t1.ShardKey = @shardkey


An example of different query plans

On my SQL 2014 instance, the queries have (slightly) different query plans and the query plan for the LEFT OUTER JOIN ON version (top) is superior to the ORM-generated LEFT OUTER JOIN WHERE OR IS NULL version (bottom).

In particular, the seek into Table2 should be a seek on both id and ShardKey, but in the LEFT OUTER JOIN WHERE OR IS NULL version it is a seek only on id and a subsequent Filter operator tests all matching rows in order to apply the appropriate ShardKey restriction.

This difference may or may not be meaningful in your specific case (especially if Id is unique without adding in the ShardKey). However, if your real version actually uses table partitioning by ShardKey (and this just isn't represented in the simple sample query), you will likely be losing the benefit of partition elimination for the seek into Table2.

enter image description here